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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: MARIA DIAZ -TORRES 

NIGEL DUNN-COLEMAN 
MATTHEW CHASE 



(ii) TITLE OF INVENTION: METHOD FOR THE 

RECOMBINANT PRODUCTION OF 1, 3 PROPANEDIOL 



(iii) NUMBER OF SEQUENCES: 4 9 
(iv) CORRESPONDENCE ADDRESS: 



(A) ADDRESSEE: GENENCOR INTERNATIONAL, INC. 

(B) STREET: 4 CAMBRIDGE PLACE 

187 0 SOUTH WINTON ROAD 

(C) CITY: ROCHESTER 

(D) STATE: NEW YORK 

(E) COUNTRY: U.S.A. 

<F) POSTAL CODE (ZIP) : 14618 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.50 INCH DISKETTE 

(B) COMPUTER: IBM PC COMPATIBLE 

(C) OPERATING SYSTEM: MICROSOFT WINDOWS 3.1 

(D) SOFTWARE: MICROSOFT WORD 2.0C 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 11/13/97 

(C) CLASSIFICATION: 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 60/030,601 

(B) FILING DATE: 11/13/96 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: GLAISTER, DEBRA 

(B) REGISTRATION NO.: 33,888 

(C) REFERENCE/ DOCKET NUMBER: GC 369-2 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 650-864-7620 

(B) TELEFAX: 650-845-6504 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1668 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM : DHABI 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:1: 

ATGAAAAGAT CAAAACGATT T G CAGT ACT G GCCCAGCGCC CCGTCAATCA GGACGGGCTG 60 

ATTGGCGAGT GGCCTGAAGA GGGGCTGATC G C CAT G G AC A GCCCCTTTGA CCCGGTCTCT 12 0 

TCAGTAAAAG TGGACAACGG TCTGATCGTC GAACTGGACG GCAAACGCCG GGACCAGTTT 18 0 

GACATGATCG ACCGATTTAT CGCCGATTAC GCGATCAACG TTGAGCGCAC AGAGC AG G C A 24 0 

ATGCGCCTGG AGGCGGTGGA AATAGCCCGT ATGCTGGTGG ATATTCACGT CAGCCGGGAG 300 

GAGATCATTG CCATCACTAC CGCCATCACG CCGGCCAAAG CGGTCGAGGT GAT GGCGCAG 360 

ATGAACGTGG T GGAGAT GAT GATGGCGCTG CAGAAGATGC GTGCCCGCCG GACCCCCTCC 42 0 

AACCAGTGCC ACGTCACCAA TCTCAAAGAT AATCCGGTGC AGATTGCCGC TGACGCCGCC 480 

GAGGCCGGGA TCCGCGGCTT CTCAGAACAG GAG AC C AC G G TCGGTATCGC GCGCTACGCG 54 0 

CCGTTTAACG CCCTGGCGCT GTTGGTCGGT TCGCAGTGCG GCCGCCCCGG CGTGTTGACG 600 

CAGTGCTCGG T GGAAGAGG C CACCGAGCTG GAGCTGGGCA TGCGTGGCTT AAC CAGCTAC 660 

GCCGAGACGG TGTCGGTCTA CGGCACCGAA GCGGTATTTA CCGACGGCGA TGATACGCCG 72 0 

TGGTCAAAGG CGTTCCTCGC CTCGGCCTAC GCCTCCCGCG GGTT GAAAAT GCGCTACACC 780 

TCCGGCACCG GATCCGAAGC GCTGATGGGC TATTCGGAGA GCAAGT C GAT GCTCTACCTC 84 0 

GAATCGCGCT GCAT CTT CAT TACTAAAGGC GCCGGGGTTC AGGGACTGCA AAACGGCGCG 900 

GTGAGCTGTA TCGGCATGAC CGGCGCTGTG CCGTCGGGCA TTCGGGCGGT GCTGGCGGAA 9 60 

AAC CT GAT C G CCTCTATGCT CGACCTCGAA GTGGCGTCCG CCAACGACCA GACTTTCTCC 1020 

CACTCGGATA TTCGCCGCAC CGCGCGCACC CT GAT GC AGA TGCTGCCGGG CACCGACTTT 1080 

ATTTTCTCCG GCTACAGCGC GGTGCCGAAC TACGACAACA TGTTCGCCGG CT CGAACTT C 114 0 

GAT GCGGAAG ATTTTGATGA TTACAAC AT C CTGCAGCGTG ACCTGATGGT TGACGGCGGC 12 00 

CTGCGTCCGG TGACCGAGGC GGAAACCATT GCCATTCGCC AGAAAGCGGC GCGGGCGATC 12 60 
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CAGGCGGTTT TCCGCGAGCT GGGGCTGCCG CCAATCGCCG AC GAG GAGGT GGAGGCCGCC 1320 

ACCTACGCGC AC GGCAG CAA CGAGATGCCG CCGCGTAACG TGGTGGAGGA TCTGAGTGCG 1380 

GT G GAAGAGA TGATGAAGCG CAACATCACC GGCCTCGATA TTGTCGGCGC GCTGAGCCGC 14 4 0 

AGCGGCTTTG AG GAT AT C GC CAGCAATATT CTCAATATGC TGCGCCAGCG GGTCACCGGC 1500 

GATTACCTGC AGACCTCGGC CATTCTCGAT CGGCAGTTCG AGGTGGTGAG TGCGGTCAAC 1560 

GACATCAATG ACT AT C AG G G GCCGGGCACC GGCTATCGCA TCTCTGCCGA ACGCTGGGCG 1620 

GAG AT C AAAA ATATTCCGGG CGTGGTTCAG CCCGACACCA TTGAATAA 1668 
(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 585 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAB2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

GT G CAACAG A CAACCCAAAT TCAGCCCTCT TTTACCCTGA AAACCCGCGA GGGCGGGGTA 60 

GCTTCTGCCG ATGAACGCGC CGATGAAGTG GTGATCGGCG TCGGCCCTGC CT T C GAT AAA 120 

CACCAGCATC ACACTCTGAT CGATATGCCC CATGGCGCGA TCCTCAAAGA GCTGATTGCC 180 

GGGGT GGAAG AAGAGGGGCT TCACGCCCGG GTGGTGCGCA TTCTGCGCAC GTCCGACGTC 240 

TCCTTTATGG CCTGGGATGC GGCCAACCTG AGCGGCTCGG GGAT C GG CAT CGGTATCCAG 300 

TCGAAGGGGA CC AC G GT CAT CCATCAGCGC GATCTGCTGC CGCTCAGCAA CCTGGAGCTG 360 

TTCTCCCAGG CGCCGCTGCT GACGCTGGAG ACCTACCGGC AG AT T G G CAA AAACGCTGCG 420 

CGCTATGCGC GCAAAGAGT C ACCTTCGCCG GTGCCGGTGG T G AAC GAT C A GATGGTGCGG 4 80 

CCGAAATTTA TGGCCAAAGC CGCGCTATTT CAT AT CAAAG AGACCAAACA TGTGGTGCAG 54 0 

GACGCCGAGC CCGTCACCCT GCACATCGAC TTAGTAAGGG AGTGA 585 

(2) INFORMATION FOR SEQ ID NO : 3 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 6 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAB3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

AT GAGCGAGA AAAC CAT GC G C GT G C AG GAT TATCCGTTAG CCACCCGCTG C C C GGAG CAT 60 

ATCCTGACGC CTACCGGCAA ACCATT GACC GAT AT T AC C C TCGAGAAGGT GCTCTCTGGC 120 

GAGGTGGGCC CGCAGGATGT GCGGATCTCC CGCCAGACCC TTGAGTACCA GGCGCAGATT 180 

GCCGAGCAGA TGCAGCGCCA TGCGGTGGCG CGCAATTTCC GCCGCGCGGC GGAGCTTATC 24 0 

GCCATTCCTG AC GAG C GCAT TCTGGCTATC TATAACGCGC TGCGCCCGTT CCGCTCCTCG 300 

CAGGCGGAGC TGCTGGCGAT CGCCGACGAG CTGGAGCACA CCTGGCATGC GACAGT GAAT 3 60 

GCCGCCTTTG TCCGGGAGTC GGCGGAAGTG TAT C AG C AG C G G CAT AAGCT GCGTAAAGGA 42 0 

AGCTAA 42 6 

(2) INFORMATION FOR SEQ ID NO : 4 : 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1164 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAT 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

AT GAG CTAT C GTATGTTTGA TTATCTGGTG CCAAACGTTA ACTTTTTTGG CCCCAACGCC 60 

ATTTCCGTAG TCGGCGAACG CTGCCAGCTG CTGGGGGGGA AAAAAGCCCT GCTGGTCACC 12 0 

GAC AAAGG C C TGCGGGCAAT TAAAGATGGC GCGGTGGACA AAACCCTGCA TTATCTGCGG 180 

GAGGCCGGGA TCGAGGTGGC GAT C TT T GAC GGCGTCGAGC CGAACCCGAA AGACACCAAC 24 0 

GTGCGCGACG GCCTCGCCGT GTTTCGCCGC GAACAGTGCG ACAT CAT C GT CACCGTGGGC 300 

GGCGGCAGCC CGCACGATTG CGGCAAAGGC ATCGGCATCG CCGCCACCCA TGAGGGCGAT 360 

CTGTACCAGT ATGCCGGAAT CGAGACCCTG ACCAACCCGC TGCCGCCTAT CGTCGCGGTC 42 0 

AATACCACCG CCGGCACCGC CAGCGAGGTC ACCCGCCACT GCGTCCTGAC CAACACCGAA 4 80 
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ACCAAAGTGA AGTTTGTGAT CGTCAGCTGG CGCAAACTGC CGTCGGTCTC TATCAACGAT 54 0 

CCACTGCTGA TGATCGGTAA ACCGGCCGCC CTGACCGCGG CGACCGGGAT GGATGCCCTG 600 

ACCCACGCCG TAGAGG CCTA TATCTCCAAA GACGCTAACC CGGTGACGGA CGCCGCCGCC 660 

ATGCAGGCGA TCCGCCTCAT CGCCCGCAAC CTGCGCCAGG CCGTGGCCCT C G G CAGCAAT 720 

CTGCAGGCGC GG GAAAACAT GGCCTATGCT TCTCTGCTGG CCGGGATGGC TTTCAATAAC 780 

GCCAACCTCG GCTACGTGCA CGCCATGGCG CACCAGCTGG GCGGCCTGTA CGACATGCCG 84 0 

CACGGCGTGG CCAACGCTGT CCTGCTGCCG CATGTGGCGC GCTACAACCT GATCGCCAAC 900 

CCGGAGAAAT TCGCCGATAT CGCTGAACTG ATGGGCGAAA ATATCACCGG ACTGTCCACT 9 60 

CTCGACGCGG CGGAAAAAGC CATCGCCGCT ATCACGCGTC TGTCGATGGA TATCGGTATT 102 0 

CCGCAGCATC TGCGCGATCT GGGGGTAAAA GAGGCCGACT TCCCCTACAT GGCGGAGATG 108 0 

GCTCTAAAAG ACGGCAATGC GTTCTCGAAC CCGCGTAAAG GCAACGAGCA GGAGATTGCC 114 0 

GCGATTTTCC GCCAGGCATT CTGA 1164 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1380 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPDl 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CTTTAATTTT CTTTTATCTT ACTCTCCTAC AT AAG AC AT C AAGAAACAAT TGTATATTGT 60 

ACACCCCCCC CCTCCACAAA CACAAATATT GATAATATAA AGATGTCTGC TGCTGCTGAT 120 

AGATTAAACT TAACTTCCGG CCACTTGAAT GCTGGTAGAA AGAGAAGTTC CTCTTCTGTT 18 0 

TCTTTGAAGG CTGCCGAAAA GCCTTTCAAG GTTACTGTGA TTGGATCTGG TAACT GGGGT 24 0 

ACT ACT ATT G CCAAGGTGGT TGCCGAAAAT TGTAAGGGAT ACCCAGAAGT TTTCGCTCCA 300 

ATAGTACAAA TGTGGGTGTT CGAAGAAGAG ATCAATGGTG AAAAATTGAC TGAAAT CAT A 360 

AATACTAGAC AT CAAAAC GT GAAAT ACT T G CCTGGCATCA CTCTACCCGA CAATTTGGTT 420 

GCTAATCCAG ACTTGATTGA TTCAGTCAAG GATGTCGACA TCATCGTTTT CAACATTCCA 480 
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CATCAATTTT TGCCCCGTAT CTGTAGCCAA TTGAAAGGTC ATGTTGATTC ACACGTCAGA 54 0 

GCTATCTCCT GTCTAAAGGG TTTTGAAGTT GGTGCTAAAG GTGTCCAATT GCTATCCTCT 600 

TACATCACTG AGGAACTAGG TATTCAATGT GGTGCTCTAT CTGGTGCTAA CATTGCCACC 660 

GAAGTCGCTC AAGAACACTG GTCTGAAACA ACAGTTGCTT ACCACATTCC AAAGGATTTC 720 

AGAGGCGAGG GCAAGGACGT CGACCATAAG GTTCTAAAGG CCTTGTTCCA CAGACCTTAC 7 80 

TTCCACGTTA GTGTCATCGA AGATGTTGCT GGTATCTCCA TCTGTGGTGC TTTGAAGAAC 84 0 

GTTGTTGCCT TAGGTTGTGG TTTCGTCGAA GGTCTAGGCT GGGGTAACAA CGCTTCTGCT 900 

G C CAT CCAAA GAGT CGGTTT GGGTGAGATC AT CAGATT C G GTCAAATGTT TTTCCCAGAA 960 

TCTAGAGAAG AAACATACTA CCAAGAGTCT GCTGGTGTTG CTGATTTGAT CACCACCTGC 1020 

GCTGGTGGTA GAAACGTCAA GGTTGCTAGG CTAATGGCTA CTTCTGGTAA GGACGCCTGG 1080 

GAATGTGAAA AGGAGTTGTT GAATGGCCAA TCCGCTCAAG GTTTAATTAC CT G CAAAGAA 114 0 

GTTCACGAAT GGTTGGAAAC ATGTGGCTCT GTCGAAGACT TCCCATTATT TGAAGCCGTA 12 00 

TACCAAATCG TTTACAACAA CTACCCAATG AAGAACCTGC C G G AC AT GAT TGAAGAATTA 12 60 

GAT CTAC AT G AAGATTAGAT TTATT GGAGA AAGATAACAT ATCATACTTC CCCCACTTTT 1320 

TTCGAGGCTC TTCTATATCA TATT CAT AAA TTAGCATTAT GTCATTTCTC ATAACTACTT 1380 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2946 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPD2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

GAATTCGAGC CTGAAGTGCT GATTACCTTC AG GTAGACT T CATCTTGACC CATCAACCCC 60 

AGCGTCAATC CTGCAAATAC ACCACCCAGC AG C ACT AGGA T GAT AG AG AT AATATAGTAC 12 0 

GTGGTAACGC TTGCCTCATC AC C TAC G CT A TGGCCGGAAT CGGCAACATC CCTAGAATTG 180 

AGTACGTGTG AT C C GGATAA CAACGGCAGT GAAT AT AT CT TCGGTATCGT AAAG AT GT GA 24 0 

T ATAAG AT G A TGTATACCCA AT GAG GAG C G CCTGATCGTG ACCTAGACCT TAGTGGCAAA 300 

AACGACATAT CTATTATAGT GGGGAGAGTT TCGTGCAAAT AACAGACGCA GCAGCAAGTA 360 
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ACTGTGACGA TATCAACTCT TTTTTTATTA TGTAATAAGC AAACAAGCAC GAATGGGGAA 42 0 

AGCCTATGTG CAATCACCAA GGTCGTCCCT TTTTTCCCAT TTGCTAATTT AGAATT T AAA 480 

GAAACCAAAA GAAT GAAGAA AGAAAACAAA TACTAGCCCT AACCCTGACT TCGTTTCTAT 54 0 

GATAATACCC TGCTTTAATG AACGGTATGC CCTAGGGTAT ATCTCACTCT GTACGTTACA 600 

AACTCCGGTT ATTTTATCGG AACATCCGAG CACCCGCGCC TTCCTCAACC CAGGCACCGC 660 

CCCAGGTAAC CGTGCGCGAT GAGCTAATCC T GAG C CATCA CCCACCCCAC CCGTTGATGA 720 

CAGCAATTCG GGAGGGCGAA AATAAAACTG GAGCAAG GAA TT AC CAT C AC CGTCACCATC 7 80 

ACCATCATAT CGCCTTAGCC T C TAG C CAT A G C CAT CAT GC AAGCGTGTAT CTTCTAAGAT 840 

T CAGT CAT CA TCATTACCGA GTTTGTTTTC CTTCACATGA TGAAGAAGGT TTGAGTATGC 9 00 

T C G AAACAAT AAGACGACGA TGGCTCTGCC AT T G GTT AT A TTACGCTTTT GCGGCGAGGT 960 

GCCGATGGGT TGCTGAGGGG AAGAGTGTTT AGCTTACGGA CCTATTGCCA TTGTTATTCC 1020 

GATTAATCTA TTGTTCAGCA GCTCTTCTCT ACCCTGTCAT TCTAGTATTT TTTTTTTTTT 10 80 

TTTTTGGTTT TACTTTTTTT TCTTCTTGCC TTTTTTTCTT GTTACTTTTT TTCTAGTTTT 114 0 

TTTTCCTTCC ACTAAGCTTT TTCCTTGATT TATCCTTGGG TTCTTCTTTC TACTCCTTTA 12 00 

GATTTTTTTT TTATATATTA ATTTTTAAGT TTAT GTATTT TGGTAGATTC AATTCTCTTT 12 60 

CCCTTTCCTT TTCCTTCGCT CCCCTTCCTT ATCAATGCTT GCTGTCAGAA GATTAACAAG 1320 

ATACACATTC CTTAAGCGAA CGCATCCGGT GTTATATACT CGTCGTGCAT ATAAAATTTT 13 8 0 

GCCTTCAAGA TCTACTTTCC TAAGAAGATC ATTATTACAA AC ACAACT G C ACTCAAAGAT 14 4 0 

GACTGCTCAT ACT AATAT C A AACAGCACAA ACACT GTCAT GAG GAC CAT C CTATCAGAAG 1500 

ATCGGACTCT GCCGTGTCAA TTGTACATTT GAAAC GTGCG CCCTTCAAGG TTACAGTGAT 1560 

TGGTTCTGGT AACTGGGGGA CCACCATCGC CAAAGTCATT GCGGAAAACA C AGAATT GC A 162 0 

TTCCCATATC TTCGAGCCAG AGGT GAGAAT GTGGGTTTTT GATGAAAAGA TCGGCGACGA 1680 

AAATCTGACG GAT AT CAT AA ATACAAGACA CCAGAACGTT AAAT AT CT AC CCAATATTGA 17 4 0 

CCTGCCCCAT AATCTAGTGG CCGATCCTGA T CT TT T ACAC TCCATCAAGG GTGCTGACAT 1800 

CCTTGTTTTC AACATCCCTC ATCAATTTTT ACCAAACATA GT CAAAC AAT TGCAAGGCCA 18 60 

CGTGGCCCCT CAT GTAAGGG CCATCTCGTG TCTAAAAGGG TTCGAGTTGG GCTCCAAGGG 192 0 

TGTGCAATTG CTATCCTCCT AT GTT ACT GA TGAGTTAGGA ATCCAATGTG GCGCACTATC 198 0 

TGGTG CAAAC TTGGCACCGG AAGT GGCCAA GGAGCATTGG T CC GAAAC CA CCGTGGCTTA 2 04 0 
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CCAACTACCA AAGGATT AT C AAGGTGATGG CAAGG AT GTA GATCATAAGA TTTTGAAATT 2100 

GCTGTTCCAC AGACCTTACT TCCACGTCAA TGTCATCGAT GATGTTGCTG GTAT AT C CAT 2160 

TGCCGGTGCC TTGAAGAACG TCGTGGCACT TGCATGTGGT TTCGTAGAAG GTATGGGATG 2220 

GGGTAACAAT GCCTCCGCAG CCATTCAAAG GCTGGGTTTA GGTGAAATTA TCAAGTTCGG 22 80 

TAGAAT GTTT TTCCCAGAAT CCAAAGTCGA GAC CTACT AT CAAGAATCCG CTGGTGTTGC 234 0 

AGATCTGATC ACCACCTGCT CAGGCGGTAG AAACGTCAAG GTTGCCACAT AC AT GG C C AA 24 00 

GACCGGTAAG TCAGCCTTGG AAGCAGAAAA GGAATTGCTT AACGGTCAAT CCGCCCAAGG 2 4 60 

GATAATCACA TGCAGAGAAG TTCACGAGTG GCTACAAACA TGTGAGTTGA CCCAAGAATT 252 0 

CCCAATTATT CGAGGCAGTC TACCAGATAG TCTACAACAA CGTCCGCATG GAAGAC CTAC 2580 

CGGAGATGAT TGAAGAGCTA GAC AT C GAT G ACGAATAGAC ACTCTCCCCC CCCCTCCCCC 264 0 

TCTGATCTTT CCTGTTGCCT CTTTTTCCCC CAACCAATTT AT CATTATAC ACAAGTT CTA 2700 

CAACTACTAC TAGTAACATT ACTACAGTTA TTATAATTTT CTATTCTCTT TTTCTTTAAG 27 60 

AAT CT AT CAT TAACGTTAAT TTCTATATAT ACATAACTAC CATTATACAC GCTATTATCG 2 820 

TTTACATATC ACATCACCGT T AAT G AAAG A TACGACACCC TGTACACTAA CACAATTAAA 2 8 80 

T AAT C G C CAT AACCTTTTCT GTT AT CTAT A GCCCTTAAAG CTGTTTCTTC GAGCTTTTCA 294 0 

CTGCAG 294 6 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3178 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

J A) ORGANISM: GUT 2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CT G CAGAACT TCGTCTGCTC TGTGCCCATC CTCGCGGTTA GAAAGAAG CT GAATTGTTTC 60 

ATGCGCAAGG G CAT C AG C G A GTGACCAATA AT C ACT G C AC TAATTCCTTT T TAG C AAC AC 12 0 

ATACTTATAT ACAGCAC CAG ACCTTATGTC TTTTCTCTGC TCCGATACGT TATCCCACCC 18 0 

AACTTTTATT TCAGTTTTGG CAGGGGAAAT TTCACAACCC CGCACGCTAA AAAT C GTAT T 24 0 
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TAAACTTAAA AGAGAACAGC CACAAATAGG GAACTTTGGT CTAAACGAAG GACTCTCCCT 300 

CCCTTATCTT GACCGTGCTA TTGCCATCAC TGCTACAAGA CTAAATACGT ACTAATATAT 360 

GTTTTCGGTA ACGAGAAGAA GAGCTGCCGG TGCAGCTGCT GCCATGGCCA CAGCCACGGG 42 0 

GACGCTGTAC TGGATGACTA GCCAAGGTGA TAGGCCGTTA GTGCACAATG ACCCGAGCTA 480 

CATGGTGCAA TTCCCCACCG CCGCTCCACC GGCAGGTCTC TAG AC G AGAC CTGCTGGACC 54 0 

GTCTGGACAA GACGCATCAA TTCGACGTGT T GAT CAT C GG TGGCGGGGCC AC GGGGAC AG 60 0 

GATGTGCCCT AGATGCTGCG ACCAGGGGAC TCAATGTGGC CCTTGTTGAA AAGGGGGATT 66 0 

TTGCCTCGGG AACGTCGTCC AAATCTACCA AG AT GAT T C A CGGTGGGGTG CGGTACTTAG 72 0 

AGAAGGCCTT CTGGGAGTTC TCCAAGGCAC AACTGGATCT GGT CATC GAG GCACTCAACG 780 

AGCGTAAACA TCTTATCAAC ACTGCCCCTC ACCTGTGCAC GGTGCTACCA ATTCTGATCC 84 0 

C CAT CTACAG CACCTGGCAG GTCCCGTACA T CT AT AT GG G CTGTAAATTC TACGATTTCT 900 

TTGGCGGTTC CCAAAACTTG AAAAAATCAT ACCTACTGTC CAAATCCGCC ACCGTGGAGA 960 

AGGCTCCCAT GCTTACCACA GACAATTTAA AGGCCTCGCT TGTGTACCAT GATGGGTCCT 102 0 

TTAAC GACT C GCGTTT GAAC GCCACTTTAG CCATCACGGG TGTGGAGAAC GGCGCTACCG 108 0 

TCTTGATCTA TGTCGAGGTA CAAAAATTGA TCAAAGACCC AACTTCTGGT AAGGTTATCG 114 0 

GTGCCGAGGC CCGGGACGTT GAGACTAATG AGCTTGTCAG AATCAACGCT AAATGTGTGG 1200 

TCAATGCCAC G GG C C CAT AC AGTGACGCCA TTTTGCAAAT GGACCGCAAC CCATCCGGTC 1260 

TGCCGGACTC CCCGCTAAAC GACAACT CCA AGATCAAGTC GACTTTCAAT CAAATCTCCG 1320 

T CAT GGAC C C GAAAATGGTC AT C C CAT CT A TTGGCGTTCA CAT CGTATT G CCCTCTTTTT 13 80 

ACTCCCCGAA G GAT AT G G GT TTGTTGGACG TCAGAACCTC T GAT GGCAGA GTGATGTTCT 14 4 0 

TTTTACCTTG GCAGGGCAAA GTCCTTGCCG GCACCACAGA CATCCCACTA AAGCAAGTCC 1500 

CAGAAAACCC TATGCCTACA GAGGCTGATA T T CAAGAT AT CT T GAAAGAA CTACAGCACT 15 60 

ATATCGAATT CCCCGTGAAA AG AG AAGAC G TGCTAAGTGC ATGGGCTGGT GTCAGACCTT 162 0 

T GGT CAGAGA TCCACGTACA ATCCCCGCAG AC GGGAAGAA GGGCTCTGCC ACTCAGGGCG 168 0 

T G GTAAGAT C CCACTTCTTG TTCACTTCGG ATAATGGCCT AATTACTATT GCAGGTGGTA 17 4 0 

AAT G GACT AC TTACAGACAA ATGGCTGAGG AAACAGTCGA CAAAGTTGTC GAAGTTGGCG 18 00 

GATTCCACAA CCTGAAACCT TGT CACACAA GAGATATTAA GCTTGCTGGT GCAGAAGAAT 1860 

GGACGCAAAA CTATGTGGCT TTATTGGCTC AAAACTACCA TTT AT CAT C A AAAATGTCCA 192 0 
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ACT ACT T G GT TCAAAACTAC GGAACCCGTT CCTCTATCAT TTGCGAATTT TTCAAAGAAT 198 0 

C CAT GGAAAA TAAACTGCCT TTGTCCTTAG CCGACAAGGA AAATAACGTA AT CT ACT CTA 204 0 

GCGAGGAGAA CAACTTGGTC AATTTT GAT A CTTTCAGATA TCCATTCACA ATCGGTGAGT 210 0 

TAAAGT ATT C CAT GCAGTAC GAATATT GT A GAACTCCCTT GGACTTCCTT TTAAGAAGAA 2160 

CAAGATTCGC CTTCTTGGAC GCCAAGGAAG CTTTGAATGC CGTGCATGCC ACCGTCAAAG 222 0 

TTATGGGTGA TGAGTTCAAT TGGTCGGAGA AAAAGAGGCA GTGGGAACTT GAAAAAACTG 228 0 

TGAACTTCAT C CAAGGAC GT TTCGGTGTCT AAATCGATCA T GAT AGTTAA GGGTGACAAA 234 0 

GATAACATTC ACAAGAGTAA TAATAAT GGT AAT GAT GAT A ATAATAATAA T G AT AGTAAT 24 00 

AACAATAATA AT AAT G GT G G TAAT GGCAAT GAAATCGCTA TT ATT AC CTA TTTTCCTTAA 2 4 60 

TGGAAGAGTT AAAGTAAACT AAAAAAACTA CAAAAATATA TGAAGAAAAA AAAAAAAAGA 252 0 

GGTAATAGAC TCTACTACTA CAATTGATCT =-TCAAATTATG ACCTTCCTAG TGTTTATATT 2 58 0 

CTATTTCCAA TACATAATAT AAT CTATATA ATCATTGCTG GTAGACTTCC GTTTTAATAT 2 64 0 

CGTTTTAATT ATCCCCTTTA TCTCTAGTCT AGTTTTATCA TAAAATATAG AAACACTAAA 2 700 

TAAT ATT CT T CAAACGGTCC TGGTGCATAC GCAATACATA TTTATGGTGC AAAAAAAAAA 27 60 

ATGGAAAATT TTGCTAGTCA TAAACCCTTT CATAAAACAA TACGTAGACA TCGCTACTTG 2 82 0 

AAATTTTCAA GTTTTTATCA GAT C CAT GTT TCCTATCTGC CTTGACAACC T CAT C GTCGA 2 8 80 

AATAGTAC CA TTTAGAACGC C C AAT AT T C A CATTGTGTTC AAGGTCTTTA TTCACCAGTG 2 940 

AC GTGTAAT G G C CAT GATT A ATGTGCCTGT ATGGTTAACC ACTCCAAATA GCTTATATTT 3000 

CAT AGT GT CA TTGTTTTTCA AT AT AAT GT T TAGTATCAAT G GAT AT GT T A CGACGGTGTT 3060 

ATTTTTCTTG GT CAAAT C GT AATAAAAT CT CGATAAATGG AT GACTAAGA TTTTTGGTAA 3120 

AGTTACAAAA TTTATCGTTT TCACTGTTGT CAATTTTTTG TTCTTGTAAT CACTCGAG 3178 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 816 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPP1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
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ATGAAACGTT TCAATGTTTT AAAAT AT AT C AGAACAACAA AAGCAAATAT ACAAAC CAT C 60 

GCAATGCCTT TGACCACAAA ACCTTTATCT TTGAAAATCA ACGCCGCTCT ATTCGATGTT 12 0 

GACGGTACCA T CAT C AT CT C TCAACCAGCC ATTGCTGCTT TCTGGAGAGA TTTCGGTAAA 180 

GACAAGCCTT ACTTCGATGC CGAACACGTT ATTCACATCT CTCACGGTTG GAGAACTTAC 24 0 

GATGCCATTG CCAAGTTCGC TCCAGACTTT GCTGATGAAG AATACGTTAA CAAGCTAGAA 300 

GGTGAAATCC CAGAAAAGTA CGGTGAACAC TCCATCGAAG TTCCAGGTGC TGTCAAGTTG 360 

TGTAATGCTT TGAACGCCTT GCCAAAGGAA AAATGGGCTG TCGCCACCTC TGGTACCCGT 420 

GACATGGCCA AGAAAT GGTT CGACATTTTG AAGATCAAGA GAC CAGAAT A CTTCATCACC 4 80 

GCCAAT GAT G TCAAGCAAGG TAAGCCTCAC C C AG AAC CAT ACTTAAAGGG TAGAAAC GGT 540 

TTGGGTTTCC CAATTAATGA ACAAGACCCA TCCAAATCTA AGGTTGTTGT CTTTGAAGAC 600 

G C AC C AG CT G GTATTGCTGC TGGTAAGGCT GCTGGCTGTA AAATCGTTGG TATTGCTACC 660 

ACTTTCGATT TGGACTTCTT GAAGGAAAAG GGTTGTGACA TCATTGTCAA G AAC C AC G AA 720 

T CT AT CAGAG TCGGTGAATA CAACGCTGAA ACCGATGAAG TCGAATTGAT CT T T GAT GAC 7 80 

TACTTAT AC G CTAAGGATGA CTTGTTGAAA TGGTAA 816 
(2) INFORMATION FOR SEQ ID NO : 9 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 753 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPP2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

AT GGGATT G A CTACTAAACC TCTATCTTTG AAAGTTAACG CCGCTTTGTT C GAC GT C GAC 60 

GGTAC CATT A T CAT CT CT CA ACCAGCCATT GCTGCATTCT GGAGGGATTT CGGTAAGGAC 120 

AAACCTTATT TCGATGCTGA ACAC GTTAT C CAAGTCTCGC ATGGTTGGAG AACGTTTGAT 180 

GCCATTGCTA AGTTCGCTCC AGACTTTGCC AAT GAAGAGT AT GTT AACAA ATTAGAAGCT 240 

GAAATTCCGG TCAAGTACGG TGAAAAATCC ATTGAAGTCC CAGGTGCAGT TAAGCTGTGC 300 

AACGCTTTGA ACGCTCTACC AAAAGAGAAA TGGGCTGTGG CAACTTCCGG T AC C C GT GAT 360 
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ATGGCACAAA AATGGTTCGA GCATCTGGGA AT CAGGAGAC CAAAGTACTT CAT T AC C G CT 



420 



AATGATGTCA AACAGGGTAA GCCTCATCCA GAACCAT AT C TGAAGGGCAG GAATGGCTTA 



480 



G GAT AT C C G A TCAATGAGCA AGACCCTTCC AAATCTAAGG TAGTAGTATT TGAAGACGCT 



540 



CCAGCAGGTA TTGCCGCCGG AAAAGCCGCC GGTTGTAAGA TCATTGGTAT TGCCACTACT 



600 



TTCGACTTGG ACTTCCTAAA GGAAAAAGGC TGTGACATCA TTGTCAAAAA CCACGAATCC 



660 



AT C AGAGTT G GCGGCTACAA TGCCGAAACA GACGAAGTTG AATTCATTTT TGACGACTAC 



720 



TTATATGCTA AGGACGATCT GTTGAAATGG TAA 



753 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2520 base pairs. 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GUT1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TGTATTGGCC AC GATAACC A CCCTTTGTAT ACTGTTTTTG TTTTTCACAT G GTAAATAAC 60 

GACTTTTATT AAACAAC GT A T GT AAAAAC A TAACAAGAAT CTACCCATAC AGGCCATTTC 120 

GTAATTCTTC TCTTCTAATT GGAGTAAAAC CAT C AAT TAA AGGGTGTGGA GT AG C AT AGT 180 

GAGGGGCTGA CTGCATTGAC AAAAAAATTG AAAAAAAAAA AGGAAAAGGA AAGGAAAAAA 240 

AGACAGCCAA GACTTTTAGA AC G GATAAGG TGTAATAAAA TGTGGGGGGA TGCCTGTTCT 300 

CGAACCATAT AAAAT AT AC C ATGTGGTTTG AGTTGTGGCC GGAACTATAC AAATAGTTAT 360 

ATGTTTCCCT CTCTCTTCCG ACTT GTAGTA TTCTCCAAAC GTTACATATT CCGATCAAGC 42 0 

CAGCGCCTTT ACACTAGTTT AAAACAAGAA CAGAGCCGTA TGTCCAAAAT AATGGAAGAT 4 80 

TTACGAAGTG ACTACGTCCC GCTTATCGCC AGTATT GAT G TAG G AAC GAC CT CAT CCAGA 54 0 

TGCATTCTGT TCAACAGATG GGGCCAGGAC GTTTCAAAAC AC CAAATT GA ATATTCAACT 600 

TCAGCATCGA AGGGCAAGAT TGGGGTGTCT GGCCTAAGGA GACCCTCTAC AGCCCCAGCT 660 

C GT GAAACAC CAAACGCCGG TGACATCAAA ACCAGCGGAA AGCCCATCTT TTCTGCAGAA 720 

GGCTATGCCA TTCAAGAAAC CAAATTCCTA AAAATCGAGG AATT GGACTT GGACTTCCAT 78 0 

AACGAACCCA CGTTGAAGTT CCCCAAACCG GGTTGGGTTG AGTGCCATCC GCAGAAATTA 84 0 
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CTGGTGAACG TCGTCCAATG CCTTGCCTCA AGTTTGCTCT CTCTGCAGAC TATCAACAGC 900 

GAACGT GTAG CAAACGGTCT CCCACCTTAC AAGGTAATAT GCATGGGTAT AGCAAACATG 9 60 

AG AGAAAC CA CAATTCTGTG GTCCCGCCGC ACAGGAAAAC CAATT GTTAA CTACGGTATT 102 0 

GTTTGGAACG ACACCAGAAC GAT C AAAAT C GTTAGAGACA AATGGCAAAA CACTAGCGTC 108 0 

GATAGGCAAC TGCAGCTTAG ACAGAAGACT GGATTGCCAT TGCTCTCCAC GTATTTCTCC 114 0 

TGTTCCAAGC TGCGCTGGTT CCTCGACAAT GAGCCTCTGT GTACCAAGGC G TAT GAG GAG 120 0 

AACGACCTGA TGTTCGGCAC TGTGGACACA TGGCTGATTT ACCAATTAAC TAAACAAAAG 12 60 

GCGTTCGTTT CT GACGTAAC CAACGCTTCC AGAACT GGAT TTATGAACCT CTCCACTTTA 132 0 

AAGTACGACA ACGAGTTGCT GGAATTTTGG GGTATTGACA AGAACCTGAT TCACATGCCC 1380 

GAAATTGTGT CCTCATCTCA AT AC T AC G GT GACTTTGGCA TTCCTGATTG GATAATGGAA 14 4 0 

AAGCTACACG ATTCGCCAAA AACAGTACTG C GAG AT CT AG T CAAGAGAAA CCTGCCCATA 1500 

CAGGGCTGTC TGGGCGACCA AAGCGCATCC ATGGTGGGGC AACTCGCTTA CAAACCCGGT 1560 

GCTGCAAAAT GTACTTATGG TACCGGTTGC TTTTTACTGT ACAATACGGG GACCAAAAAA 162 0 

TTGATCTCCC AACATGGCGC ACTGACGACT CTAGCATTTT GGTTCCCACA TTTGCAAGAG 1680 

TACGGTGGCC AAAAACCAGA ATT G AGCAAG CCACATTTTG CATTAGAGGG TTCCGTCGCT 17 40 

GTGGCTGGTG CTGTGGTCCA ATGGCTACGT GATAATTTAC GATTGATCGA T AAAT C AG AG 18 00 

GATGTCGGAC C GATT G CAT C TACGGTTCCT GATTCTGGTG GCGTAGTTTT CGTCCCCGCA 18 60 

TTTAGTGGCC TATTCGCTCC CTATTGGGAC CCAGATGCCA GAGC CAC CAT AATGGGGATG 1920 

TCTCAATTCA CTACTGCCTC CCACATCGCC AGAGCTGCCG TGGAAGGTGT TTGCTTTCAA 198 0 

GCCAGGGCTA TCTTGAAGGC AATGAGTTCT GACGCGTTTG GTGAAGGTTC CAAAGACAGG 2 04 0 

GACTTTTTAG AGGAAATTTC CGACGTCACA TAT GAAAAGT CGCCCCTGTC GGTTCTGGCA 2100 

GTGGATGGCG GGATGTCGAG GTCTAATGAA GTCATGCAAA TTCAAGCCGA TATCCTAGGT 2160 

CCCTGTGTCA AAGT CAGAAG GTCTCCGACA GCGGAATGTA CCGCATTGGG GGCAGCCATT 2220 

GCAGCCAATA TGGCTTTCAA GGATGTGAAC GAGCGCCCAT TATGGAAGGA C CTAC AC GAT 22 80 

GTTAAGAAAT GGGTCTTTTA CAATGGAATG GAGAAAAACG AAC AAAT AT C ACCAGAGGCT 2340 

CATCCAAACC TTAAGATATT CAGAAGT GAA TCCGACGATG CT GAAAGGAG AAAGCATTGG 24 00 

AAGTATTGGG AAGTT GCCGT GGAAAGAT C C AAAGGTTGGC TGAAGGACAT AGAAGGTGAA 24 60 

CACGAACAGG TTCTAGAAAA CTTCCAATAA CAACATAAAT AATTTCTATT AACAAT GTAA 252 0 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 391 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 
<D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPDl 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

Met Ser Ala Ala Ala Asp Arg Leu Asn Leu Thr Ser Gly His Leu Asn 
1 5 10 15 

Ala Gly Arg Lys Arg Ser Ser Ser Ser Val Ser Leu Lys Ala Ala Glu 
20 25 30 

Lys Pro Phe Lys Val Thr Val lie Gly Ser Gly Asn Trp Gly Thr Thr 
35 40 45 

lie Ala Lys Val Val Ala Glu Asn Cys Lys Gly Tyr Pro Glu Val Phe 
50 55 60 

Ala Pro lie Val Gin Met Trp Val Phe Glu Glu Glu lie Asn Gly Glu 
65 70 75 80 

Lys Leu Thr Glu lie lie Asn Thr Arg His Gin Asn Val Lys Tyr Leu 
85 90 95 

Pro Gly lie Thr Leu Pro Asp Asn Leu Val Ala Asn Pro Asp Leu lie 
100 105 110 

Asp Ser Val Lys Asp Val Asp lie lie Val Phe Asn lie Pro His Gin 
115 120 125 

Phe Leu Pro Arg lie Cys Ser Gin Leu Lys Gly His Val Asp Ser His 
130 135 140 

Val Arg Ala lie Ser Cys Leu Lys Gly Phe Glu Val Gly Ala Lys Gly 
145 150 155 160 

Val Gin Leu Leu Ser Ser Tyr lie Thr Glu Glu Leu Gly lie Gin Cys 
165 170 175 

Gly Ala Leu Ser Gly Ala Asn lie Ala Thr Glu Val Ala Gin Glu His 
. 180 185 190 

Trp Ser Glu Thr Thr Val Ala Tyr His lie Pro Lys Asp Phe Arg Gly 
195 200 205 



Glu Gly Lys Asp Val Asp His Lys Val Leu Lys Ala Leu Phe His Arg 
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210 215 220 

Pro Tyr Phe His Val Ser Val lie Glu Asp Val Ala Gly lie Ser lie 
225 230 235 240 

Cys Gly Ala Leu Lys Asn Val Val Ala Leu Gly Cys Gly Phe Val Glu 
24 5 250 255 

Gly Leu Gly Trp Gly Asn Asn Ala Ser Ala Ala lie Gin Arg Val Gly 
260 265 270 

Leu Gly Glu lie lie Arg Phe Gly Gin Met Phe Phe Pro Glu Ser Arg 
275 280 285 

Glu Glu Thr Tyr Tyr Gin Glu Ser Ala Gly Val Ala Asp Leu lie Thr 
290 295 300 

Thr Cys Ala Gly Gly Arg Asn Val Lys Val Ala Arg Leu Met Ala Thr 
305 310 315 320 

Ser Gly Lys Asp Ala Trp Glu Cys Glu Lys Glu Leu Leu Asn Gly Gin 
325 330 335 

Ser Ala Gin Gly Leu lie Thr Cys Lys Glu Val His Glu Trp Leu Glu 
340 345 350 

Thr Cys Gly Ser Val Glu Asp Phe Pro. Leu Phe Glu Ala Val Tyr Gin 
355 360 365 

lie Val Tyr Asn Asn Tyr Pro Met Lys Asn Leu Pro Asp Met lie Glu 
370 375 380 

Glu Leu Asp Leu His Glu Asp 
385 390 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 4 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPD2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Thr Ala His Thr Asn lie Lys Gin His Lys His Cys His Glu Asp 
15 10 15 



His Pro lie Arg Arg Ser Asp Ser Ala Val Ser lie Val His Leu Lys 
20 25 30 
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Arg Ala Pro Phe Lys Val Thr Val lie Gly Ser Gly Asn Trp Gly Thr 
35 40 45 

Thr lie Ala Lys Val lie Ala Glu Asn Thr Glu Leu His Ser His lie 
50 " 55 60 

Phe Glu Pro Glu Val Arg Met Trp Val Phe Asp Glu Lys lie Gly Asp 
65 70 75 80 

Glu Asn Leu Thr Asp lie lie Asn Thr Arg His Gin Asn Val Lys Tyr 
85 90 95 

Leu Pro Asn lie Asp Leu Pro His Asn Leu Val Ala Asp Pro Asp Leu 
100 105 110 

Leu His Ser lie Lys Gly Ala Asp lie Leu Val Phe Asn lie Pro His 
115 120 125 

Gin Phe Leu Pro Asn lie Val Lys Gin Leu Gin Gly His Val Ala Pro 
130 135 140 

His Val Arg Ala lie Ser Cys Leu Lys Gly Phe Glu Leu Gly Ser Lys 
145 150 155 160 

Gly Val Gin Leu Leu Ser Ser Tyr Val Thr Asp Glu Leu Gly lie Gin 
165 170 175 

Cys Gly Ala Leu Ser Gly Ala Asn Leu Ala Pro Glu Val Ala Lys Glu 
180 185 190 

His Trp Ser Glu Thr Thr Val Ala Tyr Gin Leu Pro Lys Asp Tyr Gin 
195 200 205 

Gly Asp Gly Lys Asp Val Asp His Lys lie Leu Lys Leu Leu Phe His 
210 215 220 

Arg Pro Tyr Phe His Val Asn Val lie Asp Asp Val Ala Gly lie Ser 
225 230 235 240 

lie Ala Gly Ala Leu Lys Asn Val Val Ala Leu Ala Cys Gly Phe Val 
245 250 255 

Glu Gly Met Gly Trp Gly Asn Asn Ala Ser Ala Ala lie Gin Arg Leu 
260 2 65 270 

Gly Leu Gly Glu He He Lys Phe Gly Arg Met Phe Phe Pro Glu Ser 
275 280 285 

Lys Val Glu Thr Tyr Tyr Gin Glu Ser Ala Gly Val Ala Asp Leu He 
290 295 300 

Thr Thr Cys Ser Gly Gly Arg Asn Val Lys Val Ala Thr Tyr Met Ala 
305 310 315 320 



Lys Thr Gly Lys Ser Ala Leu Glu Ala Glu Lys Glu Leu Leu Asn Gly 
325 330 335 
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Gin Ser Ala Gin Gly He He Thr Cys Arg Glu Val His Glu Trp Leu 
340 345 350 

Gin Thr Cys Glu Leu Thr Gin Glu Phe Pro He He Arg Gly Ser Leu 
355 360 365 

Pro Asp Ser Leu Gin Gin Arg Pro His Gly Arg Pro Thr Gly Asp Asp 
370 375 380 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 614 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: _~ 
(A) ORGANISM: GUT2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Thr Arg Ala Thr Trp Cys Asn Ser Pro Pro Pro Leu His Arg Gin 
1 5 10 15 

Val Ser Arg Arg Asp Leu Leu Asp Arg Leu Asp Lys Thr His Gin Phe 
20 25 30 

Asp Val Leu He He Gly Gly Gly Ala Thr Gly Thr Gly Cys Ala Leu 
35 40 45 

Asp Ala Ala Thr Arg Gly Leu Asn Val Ala Leu Val Glu Lys Gly Asp 
50 55 60 

Phe Ala Ser Gly Thr Ser Ser Lys Ser Thr Lys Met He His Gly. Gly 
65 70 75 80 

Val Arg Tyr Leu Glu Lys Ala Phe Trp Glu Phe Ser Lys Ala Gin Leu 
85 90 95 

Asp Leu Val He Glu Ala Leu Asn Glu Arg Lys His Leu He Asn Thr 
100 105 110 

Ala Pro His Leu Cys Thr Val Leu Pro He Leu He Pro He Tyr Ser 
115 120 125 

Thr Trp Gin Val Pro Tyr He Tyr Met Gly Cys Lys Phe Tyr Asp Phe 
130 135 140 

Phe Gly Gly Ser Gin Asn Leu Lys Lys Ser Tyr Leu Leu Ser Lys Ser 
145 150 155 160 



Ala Thr Val Glu Lys Ala Pro Met Leu Thr Thr Asp Asn Leu Lys Ala 
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165 



170 



175 



Ser Leu Val Tyr His Asp Gly Ser Phe Asn Asp Ser Arg Leu Asn Ala 
180 185 190 

Thr Leu Ala He Thr Gly Val Glu Asn Gly Ala Thr Val Leu He Tyr 
195 200 205 

Val Glu Val Gin Lys Leu He Lys Asp Pro Thr Ser Gly Lys Val He 
210 215 220 

Gly Ala Glu Ala Arg Asp Val Glu Thr Asn Glu Leu Val Arg He Asn 
225 230 235 240 

Ala Lys Cys Val Val Asn Ala Thr Gly Pro Tyr Ser Asp Ala He Leu 
245 250 255 

Gin Met Asp Arg Asn Pro Ser Gly Leu Pro Asp Ser Pro Leu Asn Asp 
260 265 270 

Asn Ser Lys He Lys Ser Thr Phe Asn Gin He Ser Val Met Asp Pro 
275 280 285 

Lys Met Val He Pro Ser He Gly Val His He Val Leu Pro Ser Phe 
290 295 300 

Tyr Ser Pro Lys Asp Met Gly Leu Leu Asp Val Arg Thr Ser Asp Gly 
305 310 315 320 

Arg Val Met Phe Phe Leu Pro Trp Gin Gly Lys Val Leu Ala Gly Thr 
325 330 335 

Thr Asp He Pro Leu Lys Gin Val Pro Glu Asn Pro Met Pro Thr Glu 
340 345 350 

Ala Asp He Gin Asp He Leu Lys Glu Leu Gin His Tyr He Glu Phe 
355 360 365 

Pro Val Lys Arg Glu Asp Val Leu Ser Ala Trp Ala Gly Val Arg Pro 
370 375 380 

Leu Val Arg Asp Pro Arg Thr He Pro Ala Asp Gly Lys Lys Gly Ser 
385 390 395 400 

Ala Thr Gin Gly Val Val Arg Ser His Phe Leu Phe Thr Ser Asp Asn 
405 410 415 

Gly Leu He Thr He Ala Gly Gly Lys Trp Thr Thr Tyr Arg Gin Met 
420 425 430 

Ala Glu Glu Thr Val Asp Lys Val Val Glu Val Gly Gly Phe His Asn 
435 440 445 

Leu Lys Pro Cys His Thr Arg Asp He Lys Leu Ala Gly Ala Glu Glu 
450 455 460 
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Trp Thr Gin Asn Tyr Val Ala Leu Leu Ala Gin Asn Tyr His Leu Ser 
465 470 475 480 

Ser Lys Met Ser Asn Tyr Leu Val Gin Asn Tyr Gly Thr Arg Ser Ser 
485 490 495 

He He Cys Glu Phe Phe Lys Glu Ser Met Glu Asn Lys Leu Pro Leu 
500 505 510 

Ser Leu Ala Asp Lys Glu Asn Asn Val He Tyr Ser Ser Glu Glu Asn 
515 520 525 

Asn Leu Val Asn Phe Asp Thr Phe Arg Tyr Pro Phe Thr He Gly Glu 
530 535 540 

Leu Lys Tyr Ser Met Gin Tyr Glu Tyr Cys Arg Thr Pro Leu Asp Phe 
545 550 555 560 

Leu Leu Arg Arg Thr Arg Phe Ala Phe Leu Asp Ala Lys Glu Ala Leu 
565 " 570 575 

Asn Ala Val His Ala Thr Val Lys Val Met Gly Asp Glu Phe Asn Trp 
580 585 590 

Ser Glu Lys Lys Arg Gin Trp Glu Leu Glu Lys Thr Val Asn Phe He 
595 600 605 

Gin Gly Arg Phe Gly Val 
610 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPSA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asn Gin Arg Asn Ala Ser Met Thr Val He Gly Ala Gly Ser Tyr 
15 10 15 

Gly Thr Ala Leu Ala He Thr Leu Ala Arg Asn Gly His Glu Val Val 
20 25 30 

Leu Trp Gly His Asp Pro Glu His He Ala Thr Leu Glu Arg Asp Arg 
35 40 45 

Cys Asn Ala Ala Phe Leu Pro Asp Val Pro Phe Pro Asp Thr Leu His 
50 55 60 
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Leu Glu Ser Asp Leu Ala Thr Ala Leu Ala Ala Ser Arg Asn lie Leu 
65 70 75 80 

Val Val Val Pro Ser His Val Phe Gly Glu Val Leu Arg Gin lie Lys 
85 90 95 

Pro Leu Met Arg Pro Asp Ala Arg Leu Val Trp Ala Thr Lys Gly Leu 
100 105 110 

Glu Ala Glu Thr Gly Arg Leu Leu Gin Asp Val Ala Arg Glu Ala Leu 
115 120 125 

Gly Asp Gin lie Pro Leu Ala Val lie Ser Gly Pro Thr Phe Ala Lys 
130 135 140 

Glu Leu Ala Ala Gly Leu Pro Thr Ala lie Ser Leu Ala Ser Thr Asp 
145 150 155 160 

Gin Thr Phe Ala Asp Asp Leu Gin Gin Leu Leu His Cys Gly Lys Ser 
165 170 175 

Phe Arg Val Tyr Ser Asn Pro Asp Phe lie Gly Val Gin Leu Gly Gly 
180- 185 190 

Ala Val Lys Asn Val lie Ala lie Gly Ala* Gly Met Ser Asp Gly lie 
195 200 205 

Gly Phe Gly Ala Asn Ala Arg Thr Ala Leu He Thr Arg Gly Leu Ala 
210 215 220 

Glu Met Ser Arg Leu Gly Ala Ala Leu Gly Ala Asp Pro Ala Thr Phe 
225 230 235 240 

Met Gly Met Ala Gly Leu Gly Asp Leu Val Leu Thr Cys Thr Asp Asn 
245 250 255 

Gin Ser Arg Asn Arg Arg Phe Gly Met Met Leu Gly Gin Gly Met Asp 
260 265 270 

Val Gin Ser Ala Gin Glu Lys He Gly Gin Val Val Glu Gly Tyr Arg 
275 280 285 

Asn Thr Lys Glu Val Arg Glu Leu Ala His Arg Phe Gly Val Glu Met 
290 295 300 

Pro He Thr Glu Glu He Tyr Gin Val Leu Tyr Cys Gly Lys Asn Ala 
305 310 315 320 

Arg Glu Ala Ala Leu Thr Leu Leu Gly Arg Ala Arg Lys Asp Glu Arg 
325 330 335 

Ser Ser His 



(2) INFORMATION FOR SEQ ID NO: 15: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 501 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GLPD 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Glu Thr Lys Asp Leu lie Val lie Gly Gly Gly lie Asn Gly Ala 
1 5 10 15 

Gly lie Ala Ala Asp Ala Ala Gly Arg Gly Leu Ser Val Leu Met Leu 
2 0 25 30 



Glu Ala Gin Asp 
35 

lie His Gly Gly 
50 

Ser Glu Ala Leu 
65 

lie Ala Phe Pro 



Pro Ala Trp Met 
100 

Lys Arg Thr Ser 
115 

Ser Val Leu Lys 
130 

Trp Val Asp Asp 
145 

Arg Lys Gly Gly 



Leu Ala Cys Ala 
40 

Leu Arg Tyr Leu 
55 

Ala Glu Arg Glu 
70 

Met Arg Phe Arg 
85 

lie Arg lie Gly 



Leu Pro Gly Ser 
120 

Pro Glu lie Lys 
135 

Ala Arg Leu Val 
150 

Glu Val Leu Thr 
165 



Thr Ser Ser Ala 



Glu His Tyr Glu 
60 

Val Leu Leu Lys 
75 

Leu Pro His Arg 
90 

Leu Phe Met Tyr 
105 

Thr Gly Leu Arg 



Arg Gly Phe Glu 
140 

Leu Ala Asn Ala 
155 

Arg Thr Arg Ala 
170 



Ser Ser Lys Leu 
45 

Phe Arg Leu Val 



Met Ala Pro His 
80 

Pro His Leu Arg 
95 

Asp His Leu Gly 
110 

Phe Gly Ala Asn 
12 5 

Tyr Ser Asp Cys 



Gin Met Val Val 
160 

Thr Ser Ala Arg 
175 



Arg Glu Asn Gly Leu Trp lie Val 
180 

Lys Lys Tyr Ser Trp Gin Ala Arg 
195 200 



Glu Ala Glu Asp lie Asp Thr Gly 
185 190 

Gly Leu Val Asn Ala Thr Gly Pro 
205 



Trp Val Lys Gin Phe Phe Asp Asp Gly Met His Leu Pro Ser Pro Tyr 
210 " 215 220 
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Gly lie Arg Leu lie Lys Gly Ser His lie Val Val Pro Arg Val His 
225 230 235 240 

Thr Gin Lys Gin Ala Tyr lie Leu Gin Asn Glu Asp Lys Arg lie Val 
245 250 255 

Phe Val lie Pro Trp Met Asp Glu Phe Ser lie lie Gly Thr Thr Asp 
260 265 270 

Val Glu Tyr Lys Gly Asp Pro Lys Ala Val Lys lie Glu Glu Ser Glu 
275 280 285 

lie Asn Tyr Leu Leu Asn Val Tyr Asn Thr His Phe Lys Lys Gin Leu 
290 295 300 

Ser Arg Asp Asp lie Val Trp Thr Tyr Ser Gly Val Arg Pro Leu Cys 
305 310 315 320 

Asp Asp Glu Ser Asp Ser Pro Gin Ala lie Thr Arg Asp Tyr Thr Leu 
325 330 335 

Asp lie His Asp Glu Asn Gly Lys Ala Pro Leu Leu Ser Val Phe Gly 
340 345 350 

Gly Lys Leu Thr Thr Tyr Arg Lys Leu Ala Glu His Ala Leu Glu Lys 
355 360 365 

Leu Thr Pro Tyr Tyr Gin Gly lie Gly Pro Ala Trp Thr Lys Glu -Ser 
370 375 380 

Val Leu Pro Gly Gly Ala lie Glu Gly Asp Arg Asp Asp Tyr Ala Ala 
385 390 395 400 

Arg Leu Arg Arg Arg Tyr Pro Phe Leu Thr Glu Ser Leu Ala Arg His 
405 410 415 

Tyr Ala Arg Thr Tyr Gly Ser Asn Ser Glu Leu Leu Leu Gly Asn Ala 
420 425 430 

Gly Thr Val Ser Asp Leu Gly Glu Asp Phe Gly His Glu Phe Tyr Glu 
435 440 445 

Ala Glu Leu Lys Tyr Leu Val Asp His Glu Trp Val Arg Arg Ala Asp 
450 455 460 

Asp Ala Leu Trp Arg Arg Thr Lys Gin Gly Met Trp Leu Asn Ala Asp 
465 470 475 480 

Gin Gin Ser Arg Val Ser Gin Trp Leu Val Glu Tyr Thr Gin Gin Arg 
485 490 495 

Leu Ser Leu Ala Ser 
500 



(2) INFORMATION FOR SEQ ID NO: 16: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 542 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

( D ) TOPOLOGY : un known 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GLPABC 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Lys Thr Arg Asp Ser Gin Ser Ser Asp Val lie lie lie Gly Gly 
1 5 10 15 

Gly Ala Thr Gly Ala Gly lie Ala Arg Asp Cys Ala Leu Arg Gly Leu 
20 25 30 

Arg Val lie Leu Val Glu Arg His Asp lie Ala Thr Gly Ala Thr Gly 
35 40 45 

Arg Asn His Gly Leu Leu His Ser Gly Ala Arg Tyr Ala Val Thr Asp 
50 55 60 

Ala Glu Ser Ala Arg Glu Cys lie Ser Glu Asn Gin lie Leu Lys Arg 
65 70 75 80 

lie Ala Arg His Cys Val Glu Pro Thr Asn Gly Leu Phe lie Thr Leu 
85 90 95 

Pro Glu Asp Asp Leu Ser Phe Gin Ala Thr Phe lie Arg Ala Cys Glu 
100 105 110 

Glu Ala Gly lie Ser Ala Glu Ala lie Asp Pro Gin Gin Ala Arg lie 
115 120 125 

lie Glu Pro Ala Val Asn Pro Ala Leu lie Gly Ala Val Lys Val Pro 
130 135 140 

Asp Gly Thr Val Asp Pro Phe Arg Leu Thr Ala Ala Asn Met Leu Asp 
145 150 155 160 

Ala Lys Glu His Gly Ala Val lie Leu Thr Ala His Glu Val Thr Gly 



Leu lie Arg Glu Gly Ala Thr Val Cys Gly Val Arg Val Arg Asn His 
180 185 190 

Leu Thr Gly Glu Thr Gin Ala Leu His Ala Pro Val Val Val Asn Ala 

195 200 205 

Ala Gly lie Trp Gly Gin His lie Ala Glu Tyr Ala Asp Leu Arg lie 



165 



170 



175 



210 



215 



220 



Arg Met Phe Pro Ala Lys Gly Ser Leu Leu He Met Asp His Arg He 
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225 230 235 240 

Asn Gin His Val lie Asn Arg Cys Arg Lys Pro Ser Asp Ala Asp lie 
245 250 255 

Leu Val Pro Gly Asp Thr lie Ser Leu lie Gly Thr Thr Ser Leu Arg 
260 265 270 

lie Asp Tyr Asn Glu lie Asp Asp Asn Arg Val Thr Ala Glu Glu Val 
275 280 285 

Asp lie Leu Leu Arg Glu Gly Glu Lys Leu Ala Pro Val Met Ala Lys 
290 295 300 

Thr Arg lie Leu Arg Ala Tyr Ser Gly Val Arg Pro Leu Val Ala Ser 
305 310 315 320 

Asp Asp Asp Pro Ser Gly Arg Asn Leu Ser Arg Gly lie Val Leu Leu 
325 330 335 

Asp His Ala Glu Arg Asp Gly Leu -Asp Gly Phe lie Thr lie Thr Gly 
34 0 345 350 

Gly Lys Leu Met Thr Tyr Arg Leu Met Ala Glu Trp Ala Thr Asp Ala 
355 360 365 

Val Cys Arg Lys Leu Gly Asn Thr Arg Pro Cys Thr Thr Ala Asp Leu 
370 375 380 _. 

Ala Leu Pro Gly Ser Gin Glu Pro Ala Glu Val Thr Leu Arg Lys Val 
385 390 395 400 

lie Ser Leu Pro Ala Pro Leu Arg Gly Ser Ala Val Tyr Arg His Gly 
405 410 415 

Asp Arg Thr Pro Ala Trp Leu Ser Glu Gly Arg Leu His Arg Ser Leu 
420 425 430 

Val Cys Glu Cys Glu Ala Val Thr Ala Gly Glu Val Gin Tyr Ala Val 
435 440 ~ 445 

Glu Asn Leu Asn Val Asn Ser Leu Leu Asp Leu Arg Arg Arg Thr Arg 
450 455 460 

Val Gly Met Gly Thr Cys Gin Gly Glu Leu Cys Ala Cys Arg Ala Ala 
465 470 475 480 

Gly Leu Leu Gin Arg Phe Asn Val Thr Thr Ser Ala Gin Ser lie Glu 
485 490 495 

Gin Leu Ser Thr Phe Leu Asn Glu Arg Trp Lys Gly Val Gin Pro lie 
500 505 510 

Ala Trp Gly Asp Ala Leu Arg Glu Ser Glu Phe Thr Arg Trp Val Tyr 
515 520 525 
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Gin Gly Leu Cys Gly Leu Glu Lys Glu Gin Lys Asp Ala Leu 
530 535 540 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 250 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 
<D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein ' 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM : GPP2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 

Met Gly Leu Thr Thr Lys- Pro Leu Ser Leu Lys Val Asn Ala Ala Leu 
15 10 15 

Phe Asp Val Asp Gly Thr lie lie lie Ser Gin Pro Ala lie Ala Ala 
20 25 30 

Phe Trp Arg Asp Phe Gly Lys Asp Lys Pro Tyr Phe Asp Ala Glu His 
35 40 45 

Val lie Gin Val Ser His Gly Trp Arg Thr Phe Asp Ala lie Ala Lys 
50 55 60 

Phe Ala Pro Asp Phe Ala Asn Glu Glu Tyr Val Asn Lys Leu Glu Ala 
65 70 75 80 

Glu lie Pro Val Lys Tyr Gly Glu Lys Ser lie Glu Val Pro Gly Ala 
85 90 95 

Val Lys Leu Cys Ash Ala Leu Asn Ala Leu Pro Lys Glu Lys Trp Ala 
100 105 110 

Val Ala Thr Ser Gly Thr Arg Asp Met Ala Gin Lys Trp Phe Glu His 
115 ~ 120 125 

Leu Gly lie Arg Arg Pro Lys Tyr Phe lie Thr Ala Asn Asp Val Lys 
130 135 140 

Gin Gly Lys Pro His Pro Glu Pro Tyr Leu Lys Gly Arg Asn Gly Leu 
145 150 155 160 

Gly Tyr Pro lie Asn Glu Gin Asp Pro Ser Lys Ser Lys Val Val Val 
165 170 175 

Phe Glu Asp Ala Pro Ala Gly lie Ala Ala Gly Lys Ala Ala Gly Cys 
180 185 190 



Lys lie lie Gly lie Ala Thr Thr Phe Asp Leu Asp Phe Leu Lys Glu 
195 200 205 
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Lys Gly Cys Asp lie lie Val Lys Asn His Glu Ser lie Arg Val Gly 
210 _ 215 220 

Gly Tyr Asn Ala Glu Thr Asp Glu Val Glu Phe lie Phe Asp Asp Tyr 
225 230 235 240 

Leu Tyr Ala Lys Asp Asp Leu Leu Lys Trp 
245 250 

(2) INFORMATION FOR SEQ ID NO : 1 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 709 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GUT1 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Phe Pro Ser Leu Phe Arg Leu Val Val Phe Ser Lys Arg Tyr lie 
15 10 15 

Phe Arg Ser Ser Gin Arg Leu Tyr Thr Ser Leu Lys Gin Glu Gin Ser 
20 25 30 

Arg Met Ser Lys lie Met Glu Asp Leu Arg Ser Asp Tyr Val Pro Leu 
35 40 45 

lie Ala Ser lie Asp Val Gly Thr Thr Ser Ser Arg Cys lie Leu Phe 
50 55 60 

Asn Arg Trp Gly Gin Asp Val Ser Lys His Gin lie Glu Tyr Ser Thr 
65 70 75 80 

Ser Ala Ser Lys Gly Lys lie Gly Val Ser Gly Leu Arg Arg Pro Ser 
85 90 95 

Thr Ala Pro Ala Arg Glu Thr Pro Asn Ala Gly Asp lie Lys Thr Ser 
100 105 110 

Gly Lys Pro lie Phe Ser Ala Glu Gly Tyr Ala lie Gin Glu Thr Lys 
115 12 0 125 

Phe Leu Lys lie Glu Glu Leu Asp Leu Asp Phe His Asn Glu Pro Thr 
130 135 140 

Leu Lys Phe Pro Lys Pro Gly Trp Val Glu Cys His Pro Gin Lys Leu 
145 " 150 155 160 



Leu Val Asn Val Val Gin Cys Leu Ala Ser Ser Leu Leu Ser Leu Gin 
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165 170 175 

Thr lie Asn Ser Glu Arg Val Ala Asn Gly Leu Pro Pro Tyr Lys Val 
180 185 190 

lie Cys Met Gly lie Ala Asn Met Arg Glu Thr Thr lie Leu Trp Ser 
195 200 205 

Arg Arg Thr Gly Lys Pro lie Val Asn Tyr Gly lie Val Trp Asn Asp 
210 215 220 

Thr Arg Thr lie Lys lie Val Arg Asp Lys Trp Gin Asn Thr Ser Val 
225 230 235 240 

Asp Arg Gin Leu Gin Leu Arg Gin Lys Thr Gly Leu Pro Leu Leu Ser 
245 250 255 

Thr Tyr Phe Ser Cys Ser Lys Leu Arg Trp Phe Leu Asp Asn Glu Pro 
260 265 - 270 

Leu Cys Thr Lys Ala Tyr Glu Glu Asn Asp Leu Met Phe Gly Thr Val 
275 280 285 

Asp Thr Trp Leu lie Tyr Gin Leu Thr Lys Gin Lys Ala Phe Val Ser 
290 295 300 

Asp Val Thr Asn Ala Ser Arg Thr Gly Phe Met Asn Leu Ser Thr Leu 
305 310 315 320 

Lys Tyr Asp Asn Glu Leu Leu Glu Phe Trp Gly lie Asp Lys Asn Leu 
325 330 335 

lie His Met Pro Glu lie Val Ser Ser Ser Gin Tyr Tyr Gly Asp Phe 
340 345 350 

Gly lie Pro Asp Trp lie Met Glu Lys Leu His Asp Ser Pro Lys Thr 
355 360 365 

Val Leu Arg Asp Leu Val Lys Arg Asn Leu Pro lie Gin Gly Cys Leu 
370 375 380 

Gly Asp Gin Ser Ala Ser Met Val Gly Gin Leu Ala Tyr Lys Pro Gly 
385 390 395 400 

Ala Ala Lys Cys Thr Tyr Gly Thr Gly Cys Phe Leu Leu Tyr Asn Thr 
405 410 415 

Gly Thr Lys Lys Leu lie Ser Gin His Gly Ala Leu Thr Thr Leu Ala 
420 425 430 

Phe Trp Phe Pro His Leu Gin Glu Tyr Gly Gly Gin Lys Pro Glu Leu 
435 440 445. 



Ser Lys Pro His Phe Ala Leu Glu Gly Ser Val Ala Val Ala Gly Ala 
450 455 460 
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Val Val Gin Trp Leu Arg Asp Asn Leu Arg Leu lie Asp *Lys Ser Glu 
465 470 475 480 

Asp Val Gly Pro lie Ala Ser Thr Val Pro Asp Ser Gly Gly Val Val 
485 490 495 

Phe Val Pro Ala Phe Ser Gly Leu Phe Ala Pro Tyr Trp Asp Pro Asp 
500 505 510 

Ala Arg Ala Thr lie Met Gly Met Ser Gin Phe Thr Thr Ala Ser His 
515 520 525 

lie Ala Arg Ala Ala Val Glu Gly Val Cys Phe Gin Ala Arg Ala lie 
530 535 540 

Leu Lys Ala Met Ser Ser Asp Ala Phe Gly Glu Gly Ser Lys Asp Arg 
545 550 555 560 

Asp Phe Leu Glu Glu lie 1 Ser Asp Val Thr Tyr Glu Lys Ser Pro Leu 
565 570 575 

Ser Val Leu Ala Val Asp Gly Gly Met Ser Arg Ser Asn Glu Val Met 
580 585 590 

Gin lie Gin Ala Asp lie Leu Gly Pro Cys Val Lys Val Arg Arg Ser 
595 600 605 

Pro Thr Ala Glu Cys Thr Ala Leu Gly Ala Ala lie Ala Ala Asn Met 
610 615 620 

Ala Phe Lys Asp Val Asn Glu Arg Pro Leu Trp Lys Asp Leu His Asp 
625 630 635 640 

Val Lys Lys Trp Val Phe Tyr Asn Gly Met Glu Lys Asn Glu Gin lie 
645 650 655 

Ser Pro Glu Ala His Pro Asn Leu Lys lie Phe Arg Ser Glu Ser Asp 
660 665 670 

Asp Ala Glu Arg Arg Lys His Trp Lys Tyr Trp Glu VaT Ala Val Glu 
675 680 685 

Arg Ser Lys Gly Trp Leu Lys Asp lie Glu Gly Glu His Glu Gin Val 
690 695 700 

Leu Glu Asn Phe Gin 
705 

(2) INFORMATION FOR SEQ ID NO: 19: 

(i) -SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12145 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: PHK28-26 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

GTCGACCACC ACGGTGGTGA CTTTAATGCC GCTCTCATGC AGCAGCTCGG TGGCGGTCTC 60 

AAAATTCAGG ATGTCGCCGG TATAGTTTTT GATAATCAGC AAGACGCCTT CGCCGCCGTC 120 

AATTTGCATC GCGCATTCAA ACATTTTGTC CGGCGTCGGC GAGGTGAATA TTTCCCCCGG 180 

ACAGGCGCCG GAGAGCATGC CCTGGCCGAT ATAGCCGCAG TGCATCGGTT CATGTCCGCT 24 0 

GCCGCCGCCG GAGAGCAGGG CCACCTTGCC AGCCACCGGC GCGTCGGTGC G G GT C AC ATA 300 

CAGCGGGTCC TGATGCAGGG TCAGCTGCGG ATGGGCTTTA GCCAGCCCCT GTAATTGTTC 3 60 

ATTCAGTACA TCTTCAACAC G GTTAAT C AG CTTTTTCATT ATTCAGTGCT CCGTT GGAGA 42 0 

AGGTTCGATG CCGCCTCTCT GCTGGCGGAG GCGGTCATCG CGTAGGGGTA TCGTCTGACG 4 80 

GTGGAGCGTG CCTGGCGATA TGATGATTCT GGCTGAGCGG ACGAAAAAAA GAATGCCCCG 54 0 

AC GAT C GG GT TTCATTACGA AACATT GCTT CCTGATTTTG TTTCTTTATG GAACGTTTTT 600 

GCTGAGGATA T GGT GAAAAT GCGAGCTGGC GCGCTTTTTT TCTTCTGCCA TAAGCGGCGG 660 

T C AG GATAGC CGGCGAAGCG GGTGGGAAAA AATTTTTTGC TGATTTTCTG CCGACTGCGG 720 

GAG AAAAG G C GGTCAAACAC GGAGGATT GT AAGGGCATTA TGCGGCAAAG GAGCGGATCG 780 

GGATCGCAAT CCTGACAGAG ACTAGGGTTT TTTGTTCCAA TATGGAACGT AAAAAATTAA 84 0 

CCTGTGTTTC ATATCAGAAC AAAAAGGCGA AAGATTTTTT TGTTCCCTGC CGGCCCTACA 900 

GT GAT C G C AC TGCTCCGGTA CGCTCCGTTC AGGCCGCGCT TCACTGGCCG GCGCGGATAA 960 

CGCCAGGGCT CAT CAT GT CT AC AT G C GCAC TTATTT GAGG GTGAAAGGAA TGCTAAAAGT 1020 

TATTCAATCT CCAGCCAAAT ATCTTCAGGG TCCTGATGCT GCTGTTCTGT TCGGTCAATA 1080 

TGCCAAAAAC CTGGCGGAGA GCTTCTTCGT CAT C GCT GAC GATTTCGTAA TGAAGCTGGC 114 0 

GGGAGAGAAA GTGGTGAATG GCCT GCAGAG CCACGATATT CGCTGCCATG CGGAACGGTT 12 00 

TAACGGCGAA TGCAGCCATG CGGAAATCAA CCGTCTGATG GCGATTTTGC AAAAACAGGG 12 60 

CTGCCGCGGC GTGGTCGGGA TCGGCGGTGG TAAAACCCTC GATACCGCGA AGGCGATCGG 132 0 

TTACTACCAG AAGCTGCCGG TGGTGGTGAT CCCGACCATC GCCTCGACCG ATGCGCCAAC 138 0 

CAGCGCGCTG TCGGTGATCT ACACC GAAGC GGGCGAGTTT GAAGAGT AT C T GAT CT AT C C 144 0 

GAAAAACCCG GAT AT G GT GG T GAT G GAC AC GGCGATTATC GCCAAAGCGC CGGTACGCCT 1500 
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GCTGGTCTCC GGCATGGGCG ATGCGCTCTC CACCTGGTTC GAGGCCAAAG CTTGCTACGA 1560 

TGCGCGCGCC AC CAG CAT GG C C GGAGGAC A GTCCACCGAG GCGGCGCTGA GCCTCGCCCG 1620 

CCTGTGCTAT GATACGCTGC TGGCGGAGGG CGAAAAGGCC CGTCTGGCGG CGCAGGCCGG 168 0 

GGTAGT GACC GAAGCGCTGG AGC GCAT CAT CGAGGCGAAC ACTTACCTCA GCGGCATTGG 174 0 

CTTTGAAAGC AGTGGCCTGG CCGCTGCCCA TGCAATCCAC AACGGTTTCA CCATTCTTGA 18 00 

AGAGT GCCAT CACCTGTATC ACGGTGAGAA AGTGGCCTTC GGTACCCTGG CGCAGCTGGT 18 60 

GCTGCAGAAC AGCCCGATGG ACGAGATTGA AACGGTGCAG GGCTTCTGCC AGCGCGTCGG 1920 

CCTGCCGGTG ACGCTCGCGC AGATGGGCGT CAAAGAGGGG AT C GAC GAGA AAATCGCCGC 19 80 

GGTGGCGAAA GCTACCTGCG CGGAAGGGGA AACCATCCAT AATATGCCGT TTGCGGTGAC 2 04 0 

CC CGGAGAG C GTCCATGCCG CTATCCTCAC CGCCGATCTG TTAGGCCAGC AGTGGCTGGC 2100 

GCGTTAATTC GCGGTGGCTA AACCGCTGGC CCAGGTCAGC GGTTTTTCTT TCTCCCCTCC 2160 

GGCAGTCGCT GCCGGAGGGG TTCTCTATGG TACAACGCGG AAAAG GAT AT GACTGTTCAG 222 0 

ACTCAGGATA CCGGGAAGGC GGTCTCTTCC GTCATTGCCC AGTCATGGCA CCGCTGCAGC 228 0 

AAGTTTATGC AGCGCGAAAC CTGGCAAACG CCGCACCAGG CCCAGGGCCT GACCTTCGAC 234 0 

TCCATCTGTC GGCGTAAAAC CGCGCTGCTC ACCATCGGCC AGGCGGCGCT GGAAGAC G C C 24 00 

TGGGAGTTTA TGGACGGCCG CCCCTGCGCG CTGTTTATTC TTGATGAGTC. CGCCTGCATC 2460 

CTGAGCCGTT GCGGCGAGCC GCAAACCCTG GCCCAGCTGG CTGCCCTGGG ATTTCGCGAC 2520 

GGCAGCTATT GTGCGGAGAG CATTATCGGC ACCTGCGCGC TGTCGCTGGC CGCGATGCAG 2580 

GGCCAGCCGA TCAACACCGC CGGCGATCGG CATTTTAAGC AGGCGCTACA G C CAT G GAGT 2 640 

TTTTGCTCGA CGCCGGTGTT T GAT AAC C AC GGGCGGCTGT TCGGCTCTAT* CTCGCTTTGC 2 7 00 

TGTCTGGTCG AG C AC CAGT C CAGCGCCGAC CTCTCCCTGA CGCTG GCCAT CGCCCGCGAG 2 760 

GTGGGTAACT CCCTGCTTAC C GAC AG C CT G CTGGCGGAAT CCAACCGTCA CCTCAATCAG 2 82 0 

ATGTACGGCC TGCTGGAGAG CAT GGAC GAT GGGGTGATGG CGTGGAACGA ACAGGGCGTG 28 8 0 

CTGCAGTTTC TCAATGTTCA GGCGGCGAGA CTGCTGCATC TTGATGCTCA GGCCAGCCAG 294 0 

G GGAAAAAT A TCGCCGATCT GGTGACCCTC CCGGCGCTGC TGCGCCGCGC CATCAAACAC 3000 

GCCCGCGGCC TGAATCACGT CGAAGTCACC TTTGAAAGTC AGCATCAGTT TGTCGATGCG 30 60 

GTGATCACCT TAAAACCGAT TGTCGAGGCG CAAGGCAACA GTTTTATTCT GCTGCTGCAT 3120 

CCGGTGGAGC AGATGCGGCA G CT GAT GAC C AGCCAGCTCG GTAAAGT CAG CCACACCTTT 318 0 
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GAGCAGATGT CTGCCGACGA TCCGGAAACC CGACGCCTGA TCCACTTTGG CCGCCAGGCG 324 0 

GCGCGCGGCG GCTTCCCGGT GCTACTGTGC GGCGAAGAGG GGGTCGGGAA AGAGCTGCTG 3300 

AGCCAGGCTA TTCACAATGA AAGCGAACGG GCGGGCGGCC CCTACATCTC CGTCAACTGC 3360 

CAGCTATATG CCGACAGCGT GCTGGGCCAG GACTTTATGG GCAGCGCCCC TACCGACGAT 342 0 

GAAAATGGTC GCCTGAGCCG CCTTGAGCTG GCCAACGGCG GCACCCTGTT TCTGGAAAAG 34 80 

AT C GAGT AT C TGGCGCCGGA GCTGCAGTCG GCTCTGCTGC AGGT GATTAA GCAGGGCGTG 354 0 

CTCACCCGCC TCGACGCCCG GCGCCTGATC CCGGTGGATG T GAAGGT GAT TGCCACCACC 36 00 

ACCGTCGATC TGGCCAATCT GGTGGAACAG AACCGCTTTA GCCGCCAGCT GTACTATGCG 3660 

CTGCACTCCT TTGAGATCGT CATCCCGCCG CTGCGCGCCC GACG CAACAG TATTCCGTCG 3720 

CTGGTGCATA ACCGGTTGAA GAGCCTGGAG AAGCGTTTCT CTTCGCGACT GAAAGT GGAC 37 8 0 

GATGACGCGC TGGCACAGCT GGTGGCCTAC TCGTGGCCGG GGAATGATTT T GAG CT CAAC 384 0 

AGCGTCATTG AGAATATCGC CAT CAGCAG C GACAACGGCC ACATTCGCCT GAGTAATCTG 3900 

CCGGAATATC TCTTTTCCGA GCGGCCGGGC GGGGATAGCG CGTCATCGCT GCTGCCGGCC 3960 

AGCCTGACTT TTAGCGCCAT CGAAAAGGAA GCTATTATTC ACGCCGCCCG GGTGACCAGC 4 02 0 

GGGCGGGTGC AGGAGATGTC GCAGCTGCTC AATATCGGCC GCACCACCCT GTGGCGCAAA 4 080 

AT GAAGCAGT AC GAT ATT GA CGCCAGCCAG TTCAAGCGCA AG CAT C AGG C CTAGTCTCTT 414 0 

CGATTCGCGC C AT GGAGAAC AGGGCATCCG ACAGGCGATT GCTGTAGCGT TTGAGCGCGT 42 00 

CGCGCAGCGG ATGCGCGCGG TCCATGGCCG TCAGCAGGCG TT C GAG CC GA CGGGACTGGG 42 60 

TGCGCGCCAC GTGCAGCTGG GCAGAGGCGA GATTCCTCCC CGGGATCACG AACT GTTTTA 4 32 0 

ACGGGCCGCT CTCGGCCATA TTGCGGTCGA TAAGCCGCTC CAGGGCGGTG ATCTCCTCTT 4 38 0 

CGCCGATCGT CTGGCTCAGG CGGGTCAGGC CCCGCGCATC GCTGGCCAGT TCAGCCCCCA 4 44 0 

GCACGAACAG CGTCTGCTGA AT AT GGTGCA GGCTTTCCCG CAGCCCGGCG TCGCGGGTCG 4 500 

TGGCGTAGCA GACGCCCAGC T G G GAT AT C A GTTCATCGAC GGTGCCGTAG GCCTCGACGC 4560 

GAATATGGTC TTTCTCGATG CGGCTGCCGC CGTACAGGGC GGTGGTGCCT TTATCCCCGG 4 620 

TGCGGGTATA GAT AC GAT AC ATTCAGTTTC TCTCACTTAA CGGCAGGACT TTAACCAGCT 4 680 

GCCCGGCGTT GGCGCCGAGC GTACGCAGTT GATCGTCGCT ATCGGTGACG TGTCCGGTAG 47 40 

CCAGCGGCGC GTCCGCCGGC AGCTGGGCAT GAGTGAGGGC TATCTCGCCG GACGCGCTGA 4 800 

GCCCGATACC CACCCGCAGG GGCGAGCTTC TGGCCGCCAG GGCGCCCAGC GCAGCGGCGT 4 8 60 
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CACCGCCTCC GTCATAGGTT ATGGTCTGGC AGGGGACCCC CTGCTCCTCC AGCCCCCAGC 4 920 

ACAGCTCATT GATGGCGCCG GCATGGTGCC CGCGCGGATC GTAAAACAGG CGTACGCCTG 4 980 

GCGGTGAAAG C G AC AT G AC G GTCCCCTCGT T AAC ACT CAG AATGCCTGGC GGAAAATCGC 504 0 

GGCAATCTCC TGCTCGTTGC CTTTACGCGG GTTCGAGAAC GCATTGCCGT CTTTTAGAGC 5100 

CATCTCCGCC ATGTAGGGGA AGTCGGCCTC TTTTACCCCC AGATCGCGCA GATGCTGCGG 5160 

AATAC CGATA TCCATCGACA G AC G C GT GAT AGCGGCGATG GCTTTTTCCG CCGCGTCGAG 522 0 

AGT GGACAGT CCGGTGATAT TTTCGCCCAT CAGTT CAGCG ATATCGGCGA ATTTCTCCGG 52 80 

GTTGGCGATC AGGTTGTAGC GCGCCACATG CGGCAGCAGG ACAGCGTTGG CCACGCCGTG 53 4 0 

CGGCATGTCG TACAGGCCGC CCAGCTGGTG CGCCATGGCG TGCACGTAGC CGAGGTT GGC 54 00 

GTTATT GAAA GCCATCCCGG C CAG CAGAGA AG C AT AG GC C ATGTTTTCCC GCGCCTGCAG 54 60 

ATTGCTGCCG AGGGCCACGG CCTGGCGCAG GTTGCGGGCG AT GAG G CGGA TCGCCTGCAT 552 0 

GGCGGCGGCG TCCGTCACCG GGTTAGCGTC TTTGGAGATA TAGGCCTCTA CGGCGTGGGT 558 0 

CAGGGCATCC ATCCCGGTCG CCGCGGTCAG GGCGGCCGGT T T AC C GAT C A TCAGCAGTGG 564 0 

AT C GTT GAT A GAGACCGACG GCAGTTTGCG CCAGCTGACG AT C AC AAAC T TCACTTTGGT 57 0 0 

TTCGGTGTTG GTCAGGACGC AGTGGCGGGT GACCTCGCTG GCGGTGCCGG CGGTGGTATT 57 60 

GACCGCGACG ATAGGCGGCA GCGGGTTGGT CAGGGTCTCG ATTCCGGCAT ACTGGTACAG 582 0 

ATCGCCCTCA TGGGTGGCGG CGATGCCGAT GCCTTTGCCG CAATCGTGCG GGCTGCCGCC 588 0 

GCCCACGGTG AC GAT GAT GT CGCACTGTTC GCGGCGAAAC ACGGCGAGGC CGTCGCGCAC 594 0 

GTTGGTGTCT TTCGGGTTCG GCTCGACGCC GTCAAAGATC GCCACCTCGA TCCCGGCCTC 6000 

CCGCAGATAA TGCAGGGTTT TGTCCACCGC GCCATCTTTA ATTGCCCGCA" GGCCTTTGTC 6060 

GGTGACCAGC AGGGCTTTTT TCCCCCCCAG CAGCTGGCAG CGTTCGCCGA CTACGGAAAT 6120 

GGCGTTGGGG CCAAAAAAGT TAACGTTTGG CACCAGATAA T C AAAC AT AC GATAGCTCAT 618 0 

AATATAC C TT CTCGCTTCAG GTTATAATGC GGAAAAACAA TCCAGGGCGC ACTGGGCTAA 624 0 

TAATTGATCC TGCTCGACCG TACCGCCGCT AACGCCGACG GCGCCAATTA CCTGCTCATT 6300 

AAAAATAACT GGCAGGCCGC CGCCAAAAAT AATAATTCGC TGTTGGTTGG TTAGCTGCAG 6360 

ACCGTACAGA GATTGTCCTG GCTGGACCGC TGACGTAATT TCATGGGTAC CTTGCTTCAG 642 0 

GCTGCAGGCG CTCCAGGCTT TATTCAGGGA AATATCGCAG C T G GAG AC G A AGGCCTCGTC 64 8 0 

CATCCGCTGG ATAAGCAGCG TGTTGCCTCC GCGGTCAACT AC GGAAAAC A CCACCGCCAC 654 0 
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GTTGATCTCA GTGGCTTTTT TTTCCACCGC CGCCGCCATT TGCTGGGCGG CGGCCAGGGT 6600 

GATTGTCTGA ACTTGTTGGC TCTTGTTCAT CATTCTCTCC CGCACCAGGA TAACGCTGGC 6660 

GCGAATAGTC AGTAGGGGGC GATAGTAAAA AACTATTACC ATTCGGTTGG CTTGCTTTAT 6720 

TTTTGTCAGC GTTATTTTGT CGCCCGCCAT GATTTAGTCA ATAGGGTTAA AATAGCGTCG 67 8 0 

GAAAAACGTA ATTAAGGGCG TTTTTTATTA ATTGATTTAT ATCATTGCGG GCGATCACAT 684 0 

TTTTTATTTT TGCCGCCGGA GTAAAGTTTC AT AGT GAAAC TGTCGGTAGA TTTCGTGTGC 6900 

CAAATTGAAA CGAAATTAAA TTTATTTTTT TCACCACTGG CTCATTTAAA GTTCCGCTAT 6960 

TGCCGGTAAT GGCCGGGCGG CAACGACGCT GGCCCGGCGT ATTCGCTACC GTCTGCGGAT 7 02 0 

TTCACCTTTT GAGCCGATGA ACAAT GAAAA GATCAAAACG ATTTGCAGTA CTGGCCCAGC 7080 

GCCCCGTCAA TCAGGACGGG CTGATTGGCG AGTGGCCTGA AGAGGGGCTG ATCGCCATGG 714 0 

ACAGCCCCTT TGACCCGGTC TCTT CAGTAA AAGTGGACAA CGGTCTGATC GTCGAACTGG 72 00 

ACGGCAAACG CCGGGACCAG TTTGACATGA TCGACCGATT TATCGCCGAT T AC G C GAT C A 72 60 

ACGTTGAGCG CACAGAGCAG GCAATGCGCC TGGAGGCGGT GGAAATAGCC CGTATGCTGG 732 0 

T G GAT ATT CA CGTCAGCCGG GAGGAG AT CA TTGCCATCAC TACCGCCATC ACGCCGGCCA 738 0 

AAGCGGTCGA GGTGATGGCG CAGATGAACG TGGTGGAGAT GATGATGGCG CTGCAGAAGA 74 4 0 

TGCGTGCCCG CCGGACCCCC TCCAACCAGT GCCACGTCAC C AAT CT C AAA GATAATCCGG 7500 

TGCAGATTGC CGCTGACGCC GCCGAGGCCG GGATCCGCGG CTTCTCAGAA CAGGAGACCA 7 560 

CGGTCGGTAT CGCGCGCTAC GCGCCGTTTA ACGCCCTGGC GCTGTTGGTC GGTTCGCAGT 7 62 0 

GCGGCCGCCC CGGCGTGTTG ACGCAGTGCT CGGTGGAAGA GGCCACCGAG CTGGAGCTGG 7 68 0 

GCATGCGTGG CTTAACCAGC T AC G C C GAGA CGGTGTCGGT CTACGGCACC" GAAGCGGTAT 7 74 0 

TTACCGACGG CGAT GAT AC G CCGTGGTCAA AGGCGTTCCT CGCCTCGGCC TACGCCTCCC 7 800 

GCGGGTTGAA AATGCGCTAC ACCTCCGGCA CCGGATCCGA AGCGCTGATG GGCTATTCGG 7 8 60 

AGAGCAAGTC GAT G CT CT AC CTCGAATCGC GCTGCATCTT CATTACTAAA GGCGCCGGGG 7 92 0 

TTCAGGGACT GCAAAACGGC GCGGTGAGCT GTATCGGCAT GACCGGCGCT GTGCCGTCGG 7 980 

GCATTCGGGC GGTGCTGGCG GAAAACCTGA TCGCCTCTAT GCTCGACCTC GAAGTGGCGT 804 0 

CCGCCAACGA CCAGACTTTC TCCCACTCGG ATATTCGCCG CACCGCGCGC ACCCTGATGC 8100 

AGATGCTGCC GGGCACCGAC TTTATTTTCT CCGGCTACAG CGCGGTGCCG AACT AC GACA 8160 

ACATGTTCGC CGGCTCGAAC TTCGATGCGG AAGATTTTGA TGATTACAAC ATCCTGCAGC 8220 
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GTGACCTGAT 


GGTTGACGGC 


GGCCTGCGTC 


CGGTGACCGA 


GGCGGAAACC 


ATTGCCATTC 


8280 


GCCAGAAAGC 


GGCGCGGGCG 


ATCCAGGCGG 


TTTTCCGCGA 


GCTGGGGCTG 


CCGCCAATCG 


8340 


CCGACGAGGA 


GGTGGAGGCC 


GCCACCTACG 


CGCACGGCAG 


C AAC GAG AT G 


CCGCCGCGTA 


8400 


ACGTGGTGGA 


GGATCTGAGT 


GCGGTGGAAG 


AGATGATGAA 


GCGCAACATC 


ACCGGCCTCG 


8460 


AT ATT GT C GG 


CGCGCTGAGC 


CGCAGCGGCT 


TTGAGGATAT 


CGCCAGCAAT 


ATTCTCAATA 


8520 


TGCTGCGCCA 


GCGGGTCACC 


GGCGATTACC 


TGCAGACCTC 


GGCCATTCTC 


GATCGGCAGT 


8580 


TCGAGGTGGT 


GAGTGCGGTC 


AACGAC AT CA 


AT GACTAT CA 


GGGGCCGGGC 


ACCGGCTATC 


8640 


GCATCTCTGC 


CGAACGCT GG 


GCGGAGATCA 


AAAATATTCC 


GGGCGTGGTT 


CAGCCCGACA 


8700 


CCATTGAATA 


AGGCGGTATT 


CCTGTGCAAC 


AGACAACCCA 


AATTCAGCCC 


TCTTTTACCC 


8760 


TGAAAACCCG 


CGAGGGCGGG 


GTAGCTTCTG 


CCGATGAACG 


CGCCGATGAA 


GTGGTGATCG 


8820 


GCGTCGGCCC 


TGCCTTCGAT 


AAACAC C AG C 


AT CACACT CT 


GAT C GAT AT G 


CCCCATGGCG 


8880 


CGATCCTCAA 


AGAGCTGATT 


GCCGGGGTGG 


AAGAAGAGGG 


GCTTCACGCC 


CGGGTGGTGC 


8 94 0 


GCATTCTGCG 


CACGTCCGAC 


GTCTCCTTTA 


TGGCCTGGGA 


TGCGGCCAAC 


CTGAGCGGCT 


9000 


CGGGGATC GG 


CAT C GGTAT C 


CAGTCGAAGG 


GGACCACGGT 


CAT C CAT C AG 


CGCGATCTGC 


9060 


TGCCGCTCAG 


CAACCTGGAG 


CTGTTCTCCC 


AGGCGCCGCT 


GCTGACGCTG 


GAGACCTACC 


9120 


GGCAGATTGG 


CAAAAACGCT 


GCGCGCTATG 


CGCGCAAAGA 


GTCACCTTCG 


CCGGTGCCGG 


9180 


TGGTGAACGA 


TCAGATGGTG 


CGGCCGAAAT 


TTATGGCCAA 


AGCCGCGCTA 


TT T CAT AT CA 


9240 


AAGAGACCAA ACATGTGGTG 


CAGGACGCCG 


AGCCCGTCAC 


CCTGCACATC 


GACTTAGTAA 


9300 


GGGAGT GAC C 


AT GAG C GAGA 


AAACCATGCG 


CGTGCAGGAT 


TATCCGTTAG 


CCACCCGCTG 


9360 


CCCGGAGCAT 


ATCCTGACGC 


CTACCGGCAA 


ACCATTGACC 


GAT ATT AC CC 


TCGAGAAGGT 


9420 


GCTCTCTGGC 


GAGGTGGGCC 


CGCAGGATGT 


GCGGATCTCC 


CGCCAGACCC 


TTGAGTACCA 


9480 


GGCGCAGATT 


GCCGAGCAGA 


TGCAGCGCCA 


TGCGGTGGCG 


CGCAATTTCC 


GCCGCGCGGC 


9540 


GGAGCTTATC 


GCCATTCCTG 


AC GAG C GC AT 


TCTGGCTATC 


TATAACGCGC 


TGCGCCCGTT 


9600 


CCGCTCCTCG 


CAGGCGGAGC 


TGCTGGCGAT 


CGCCGACGAG 


CT GGAGCACA 


CCTGGCATGC 


9660 


GACAGT GAAT 


GCCGCCTTTG 


TCC GGGAGT C 


GGCGGAAGTG 


TAT C AG C AG C 


GGCATAAGCT 


9720 


GCGTAAAGGA AGCTAAGCGG 


AGGTCAGCAT 


GCCGTTAATA 


GCCGGGATTG 


ATATCGGCAA 


9780 


CGCCACCACC 


GAGGTGGCGC 


TGGCGTCCGA 


CTACCCGCAG 


GCGAGGGCGT 


TTGTTGCCAG 


9840 


CGGGATCGTC 


GCGACGACGG 


GCATGAAAGG 


GACGCGGGAC 


AATATCGCCG 


GGACCCTCGC 


9900 
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CGCGCTGGAG CAGGCCCTGG CGAAAACACC GTGGTCGATG AGCGATGTCT CTCGCATCTA 9960 

TCTTAACGAA GCCGCGCCGG TGATTGGCGA TGTGGCGATG GAGACCAT C A C C GAGAC CAT 10020 

TATCACCGAA T C G AC CAT G A TCGGTCATAA CCCGCAGACG CCGGGCGGGG TGGGCGTTGG 1008 0 

CGTGGGGACG ACTATCGCCC TCGGGCGGCT GGCGACGCTG CCGGCGGCGC AGTATGCCGA 1014 0 

GGGGTGGATC GTACTGATTG ACGACGCCGT CGATTTCCTT GACGCCGTGT GGTGGCTCAA 10200 

TGAGGCGCTC GACCGGGGGA TCAACGTGGT GGCGGCGATC CTCAAAAAGG ACGACGGCGT 10260 

GCTGGTGAAC AACCGCCTGC GTAAAACCCT GCCGGTGGTG GATGAAGTGA CGCTGCTGGA 10320 

GCAGGTCCCC GAGGGGGTAA TGGCGGCGGT GGAAGTGGCC GCGCCGGGCC AGGTGGTGCG 10380 

GATCCTGTCG AATCCCTACG GGAT CGCCAC CTTCTTCGGG CTAAGCCCGG AAGAG AC C C A 10440 

GGCCATCGTC CCCATCGCCC GCGCCCTGAT TGGCAACCGT TCCGCGGTGG TGCTCAAGAC 10500 

CCCGCAGGGG GATGTGCAGT CGCGGGTGAT CCCGGCGGGC AACCTCTACA TTAGCGGCGA 10560 

AAAGCGCCGC GGAGAGGCCG ATGTCGCCGA GGGCGCGGAA G C CAT CAT G C AGGC GAT GAG 1062 0 

CGCCTGCGCT CCGGTACGCG ACATCCGCGG CGAACCGGGC ACCCACGCCG GCGGCATGCT 1068 0 

TGAGCGGGTG CGCAAGGTAA TGGCGTCCCT GAC CGGC CAT GAGATGAGCG C GAT AT AC AT 1074 0 

CCAGGATCTG CTGGCGGTGG AT AC GTT T AT TCCGCGCAAG GTGCAGGGCG GGATGGCCGG 108 00 

CGAGTGCGCC AT G GAG AAT G CCGTCGGGAT GGCGGCGATG GTGAAAGCGG ATCGTCTGCA 10860 

AATGCAGGTT ATCGCCCGCG AACTGAGCGC CCGACTGCAG ACCGAGGTGG TGGTGGGCGG 10920 

CGTGGAGGCC AACATGGCCA TCGCCGGGGC GTTAACCACT CCCGGCTGTG CGGCGCCGCT 109 80 

GGCGATCCTC GACCTCGGCG CCGGCTCGAC GGATGCGGCG ATCGTCAACG CGGAGGGGCA 11040 

GATAACGGCG GTCCATCTCG CCGGGGCGGG GAATATGGTC AGCCTGTTGA TTAAAACCGA 11100 

GCTGGGCCTC GAGGAT CTTT CGCTGGCGGA AGC GATAAAA AAATACCCGC TGGCCAAAGT 11160 

GGAAAGCCTG TTCAGTATT C GTCACGAGAA TGGCGCGGTG GAGTTCTTTC GGGAAGCCCT 11220 

CAGCCCGGCG GTGTTCGCCA AAGTGGTGTA CATCAAGGAG GGCGAACTGG TGCCGATCGA 1128 0 

TAACGCCAGC CCGCTGGAAA AAATTCGTCT CGTGCGCCGG CAGGCGAAAG AGAAAGT GTT 1134 0 

TGTCACCAAC TGCCTGCGCG CGCTGCGCCA GGTCTCACCC GGCGGTTCCA TT C G C GAT AT 114 00 

CGCCTTTGTG GTGCTGGTGG GCGGCTCATC GCTGGACTTT GAGATCCCGC AGCTTATCAC 114 60 

GGAAGCCTTG T C G C ACT AT G GCGTGGTCGC CGGGCAGGGC AATATTCGGG GAACAGAAGG 1152 0 

GCCGCGCAAT GCGGTCGCCA CCGGGCTGCT ACTGGCCGGT CAGGCGAATT AAACGGGCGC 115 8 0 



WO 98/21341 



PCT/US97/20873 



- 87 - 

TCGCGCCAGC CTCTCTCTTT AACGT GCTAT TTCAGGATGC CGATAATGAA CCAGACTTCT 11640 

ACCTTAACCG GGCAGTGCGT GGCCGAGTTT CTTGGCACCG GATT GCT CAT TTTCTTCGGC 117 00 

GCGGGCTGCG TCGCTGCGCT GCGGGTCGCC GGGGCCAGCT TTGGTCAGTG GGAGATCAGT 117 60 

ATTATCTGGG GCCTTGGCGT CGCCATGGCC AT CT AC CT GA CGGCCGGTGT CTCCGGCGCG 11820 

CACCTAAATC CGGCGGTGAC CATTGCCCTG TGGCTGTTCG CCTGTTTTGA ACGCCGCAAG 118 8 0 

GTGCTGCCGT TTATTGTTGC CCAGACGGCC GGGGCCTTCT GCGCCGCCGC GCTGGTGTAT 11940 

GGGCTCTATC GCCAGCTGTT TCTCGATCTT GAACAGAGTC AG CAT AT C GT GCGCGGCACT 12 000 

GCCGCCAGTC TTAACCTGGC CGGGGTCTTT TCCACGTACC CGCATCCACA TAT CACTTTT 12 060 

ATACAAGC GT TTGCCGTGGA GACCACCATC ACGGCAATCC TGATGGCGAT GAT CAT G GC C 12120 

CTGACCGACG ACGGCAACGG AATTC 1214 5 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 94 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
AGCTTAGGAG TCTAGAATAT TGAGCTCGAA TTCCCGGGCA TGCGGTACCG GATCCAGAAA 60 
AAAGCCCGCA CCT GACAGT G CGGGCTTTTT TTTT 94 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 37 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GGAATTCAGA T CT CAGCAAT GAGC GAGAAA ACCATGC 37 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



<ii) 



MOLECULE TYPE: DNA (genomic) 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 22: 



GCTCTAGATT AGCTTCCTTT ACGCAGC 



27 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GGCCAAGCTT AAGGAGGTTA ATTAAATGAA AAG 33 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
GCTCTAGATT ATTCAATGGT GTCGGG 2 6 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
GCGCCGTCTA GAATTAT GAG CT AT C GT AT G TT T G ATT AT C TG 42 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) 



MOLECULE TYPE: DNA (genomic) 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



TCTGATACGG GATCCTCAGA ATGCCTGGCG GAAAAT 



36 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 51 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
GCGCGGATCC AGGAGTCTAG AATTATGGGA TTGACTACTA AACCTCTATC T 51 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
GATACGCCCG GGTTACCATT T C AAC AG AT C GTCCTT 36 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 
TCGACGAATT CAGGAGGA 18 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs . 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) 



MOLECULE TYPE: DNA (genomic) 



(xi) 



SEQUENCE DESCRIPTION: SEQ ID NO: 30: 



CTAGTCCTCC TGAATTCG 



18 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS:' 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
CTAGTAAGGA GGACAATTC 19 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
CAT GGAAT T G TCCTCCTTA 19 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: GPP1 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

Met Lys Arg Phe Asn Val Leu Lys Tyr lie Arg Thr Thr Lys Ala Asn 
1 5 10 15 



lie Gin Thr lie Ala Met Pro Leu Thr Thr Lys Pro Leu Ser Leu Lys 
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20 25 30 

lie Asn Ala Ala Leu Phe Asp Val Asp Gly Thr lie lie lie Ser Gin 
35 40 45 

Pro Ala lie Ala Ala Phe Trp Arg Asp Phe Gly Lys Asp Lys Pro Tyr 
50 55 60 

Phe Asp Ala Glu His Val lie His lie Ser His Gly Trp Arg Thr Tyr 
65 70 75 80 

Asp Ala lie Ala Lys Phe Ala Pro Asp Phe Ala Asp Glu Glu Tyr Val 
85 90 95 

Asn Lys Leu Glu Gly Glu lie Pro Glu Lys Tyr Gly Glu His Ser lie 
100 105 110 

Glu Val Pro Gly Ala Val Lys Leu Cys Asn Ala Leu Asn Ala Leu Pro 
115 120 125 

Lys Glu Lys Trp Ala Val Ala Thr Ser Gly Thr Arg Asp Met Ala Lys 
130 135 140 

Lys Trp Phe Asp lie Leu Lys lie Lys Arg Pro Glu Tyr Phe lie Thr 
145 150 155 160 

Ala Asn Asp Val Lys Gin Gly Lys Pro His Pro Glu Pro Tyr Leu Lys 
165 170 175 

Gly Arg Asn Gly Leu Gly Phe Pro lie Asn Glu Gin Asp Pro Ser Lys 
180 ~ 185 190 

Ser Lys Val Val Val Phe Glu Asp Ala Pro Ala Gly lie Ala Ala Gly 
195 200 205 

Lys Ala Ala Gly Cys Lys lie Val Gly lie Ala Thr Thr Phe Asp Leu 
210 215 220 

Asp Phe Leu Lys Glu Lys Gly Cys Asp lie lie Val Lys Asn His Glu 
225 230 235 240 

Ser lie Arg Val Gly Glu Tyr Asn Ala Glu Thr Asp Glu Val Glu Leu 
245 250 255 

lie Phe Asp Asp Tyr Leu Tyr Ala Lys Asp Asp Leu Leu Lys Trp 
260 265 270 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 555 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: protein 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHABI 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

Met Lys Arg Ser Lys Arg Phe Ala Val Leu Ala Gin Arg Pro Val Asn 
15 10 15 

Gin Asp Gly Leu lie Gly Glu Trp Pro Glu Glu Gly Leu lie Ala Met 
20 25 30 

Asp Ser Pro Phe Asp Pro Val Ser Ser Val Lys Val Asp Asn Gly Leu 
35 4 0 4 5 

lie Val Glu Leu Asp Gly Lys Arg Arg Asp Gin Phe Asp Met lie Asp 
50 55 60 

Arg Phe lie Ala Asp Tyr Ala lie Asn Val Glu Arg Thr Glu Gin Ala 
65 70 75 80 

Met Arg Leu Glu Ala Val Glu lie Ala Arg Met Leu Val Asp lie His 
85 90 95 

Val Ser Arg Glu Glu lie lie Ala lie Thr Thr Ala lie Thr Pro Ala 
100 105 - 110 

Lys Ala Val Glu Val Met Ala Gin Met Asn Val Val Glu Met Met Met 
115 120 125 

Ala Leu Gin Lys Met Arg Ala Arg Arg Thr Pro Ser Asn Gin Cys His 
130 135 140 

Val Thr Asn Leu Lys Asp Asn Pro Val Gin lie Ala Ala Asp Ala Ala 
145 150 155 160 

Glu Ala Gly lie Arg Gly Phe Ser Glu Gin Glu Thr Thr Val Gly He 
165 170 175 

Ala Arg Tyr Ala Pro Phe Asn Ala Leu Ala Leu Leu Val Gly Ser Gin 
180 185 190 

Cys Gly Arg Pro Gly Val Leu Thr Gin Cys Ser Val Glu Glu Ala Thr 
195 200 205 

Glu Leu Glu Leu Gly Met Arg Gly Leu Thr Ser Tyr Ala Glu Thr Val 
210 215 220 

Ser Val Tyr Gly Thr Glu Ala Val Phe Thr Asp Gly Asp Asp Thr Pro 
225 230 235 240 

Trp Ser Lys Ala Phe Leu Ala Ser Ala Tyr Ala Ser Arg Gly Leu Lys 
245 250 255 

Met Arg Tyr— T-hr Ser Gly Thr Gly Ser Glu Ala Leu Met Gly Tyr Ser 
260 265 270 
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Glu Ser Lys Ser Met Leu Tyr Leu Glu Ser Arg Cys lie Phe lie Thr 
275 280 285 

Lys Gly Ala Gly Val Gin Gly Leu Gin Asn Gly Ala Val Ser Cys lie 
290 295 300 

Gly Met Thr Gly Ala Val Pro Ser Gly lie Arg Ala Val Leu Ala Glu 
305 310 315 320 

Asn Leu lie Ala Ser Met Leu Asp Leu Glu Val Ala Ser Ala Asn Asp 
325 330 335 

Gin Thr Phe Ser His Ser Asp lie Arg Arg Thr Ala Arg Thr Leu Met 
340 345 350 

Gin Met Leu Pro Gly Thr Asp Phe lie Phe Ser Gly Tyr Ser Ala Val 
355 360 365 

Pro Asn Tyr Asp Asn Met Phe Ala Gly Ser Asn Phe Asp Ala Glu Asp 
370 375 380 

Phe Asp Asp Tyr Asn lie Leu Gin Arg Asp Leu Met Val Asp Gly Gly 
385 390 395 400 

Leu Arg Pro Val Thr Glu Ala Glu Thr lie Ala lie Arg Gin Lys Ala 
4 05 410 415 

Ala Arg Ala lie Gin Ala Val Phe Arg Glu Leu Gly Leu Pro Pro lie 
420 425 430 

Ala Asp Glu Glu Val Glu Ala Ala Thr Tyr Ala His Gly Ser Asn Glu 
435 440 445 

Met Pro Pro Arg Asn Val Val Glu Asp Leu Ser Ala Val Glu Glu Met 
450 455 460 

Met Lys Arg Asn lie Thr Gly Leu Asp lie Val Gly Ala Leu Ser Arg 
465 470 475 480 

Ser Gly Phe Glu Asp lie Ala Ser Asn lie Leu Asn Met Leu Arg Gin 
485 490 495 

Arg Val Thr Gly Asp Tyr Leu Gin Thr Ser Ala lie Leu Asp Arg Gin 
500 505 510 

Phe Glu Val Val Ser Ala Val Asn Asp lie Asn Asp Tyr Gin Gly Pro 
515 520 525 

Gly Thr Gly Tyr Arg lie Ser Ala Glu Arg Trp Ala Glu lie Lys Asn 

530 ... - 535 540 



lie Pro Gly Val Val Gin Pro Asp Thr lie Glu 
545 550 555 
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(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 amino acids 

( B ) TYPE : amino acid 

(C) STRANDEDNESS: unknown 

( D ) TOPOLOGY : unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAB2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Met Gin Gin Thr Thr Gin lie Gin Pro Ser Phe Thr Leu Lys Thr Arg 
15 10 15 

Glu Gly Gly Val Ala Ser Ala Asp Glu Arg Ala Asp Glu Val Val lie 
20 25 30 

Gly Val Gly Pro Ala Phe Asp Lys His Gin His His Thr Leu lie Asp 
35 40 45 

Met Pro His Gly Ala He Leu Lys Glu Leu He Ala Gly Val Glu Glu 
50 55 60 

Glu Gly Leu His Ala Arg Val Val Arg He Leu Arg Thr Ser Asp Val 
65 70 75 80 

Ser Phe Met Ala Trp Asp Ala Ala Asn Leu Ser Gly Ser Gly He Gly 
85 90 95 

He Gly He Gin Ser Lys Gly Thr Thr Val He His Gin Arg Asp Leu 
100 105 110 

Leu Pro Leu Ser Asn Leu Glu Leu Phe Ser Gin Ala Pro Leu Leu Thr 
115 120 125 

Leu Glu Thr Tyr Arg Gin He Gly Lys Asn Ala Ala Arg Tyr Ala Arg 
130 * 135 140 

Lys Glu Ser Pro Ser Pro Val Pro Val Val Asn Asp Gin Met Val Arg 
145 150 155 160 

Pro Lys Phe Met Ala Lys Ala Ala Leu Phe His He Lys Glu Thr Lys 
165 170 175 

His Val Val Gin Asp Ala Glu Pro Val Thr Leu His He Asp Leu Val 
180 185 190 



Arg Glu 



# 
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(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 0 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: DHAB3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

Met Ser Glu Lys Thr Met Arg Val Gin Asp Tyr Pro Leu Ala Thr Arg 
15 10 15 

Cys Pro Glu His lie Leu Thr Pro Thr Gly Lys Pro Leu Thr Asp lie 
20 25 30 

Thr Leu Glu Lys Val Leu Ser Gly Glu Val Gly Pro Gin Asp Val Arg 
35 40 45 

lie Ser Arg Gin Thr Leu Glu Tyr Gin Ala Gin lie Ala Glu Gin Met 
50 .55 60 

Gin His Ala Val Ala Arg Asn Phe Arg Arg Ala Ala Glu Leu lie Ala 
65 70 75 80 

lie Pro Asp Glu Arg lie Leu Ala lie Tyr Asn Ala Leu Arg Pro Phe 
85 90 95 

Arg Ser Ser Gin Ala Glu Leu Leu Ala lie Ala Asp Glu Leu Glu His 
100 105 110 

Thr Trp His Ala Thr Val Asn Ala Ala Phe Val Arg Glu Ser Ala Glu 
115 120 125 

Val Tyr Gin Gin Arg His Lys Leu Arg Lys Gly Ser 
130 135 140 

(2) INFORMATION. FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 387 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(vi) 



ORIGINAL SOURCE: 
(A) ORGANISM: DHAT 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

Met Ser Tyr Arg Met Phe Asp Tyr Leu Val Pro Asn Val Asn Phe Phe 
15 10 15 

Gly Pro Asn Ala lie Ser Val Val Gly Glu Arg Cys Gin Leu Leu Gly 
20 25 30 

Gly Lys Lys Ala Leu Leu Val Thr Asp Lys Gly Leu Arg Ala lie Lys 
35 40 45 

Asp Gly Ala Val Asp Lys Thr Leu His Tyr Leu Arg Glu Ala Gly lie 
50 55 60 

Glu Val Ala lie Phe Asp Gly Val Glu Pro Asn Pro Lys Asp Thr Asn 
65 70 75 80 

Val Arg Asp Gly Leu Ala Val Phe Arg Arg Glu Gin Cys Asp lie lie 
85 90 95 

Val Thr Val Gly Gly Gly Ser Pro iii-s Asp Cys Gly Lys Gly lie Gly 
100 105 110 



lie Ala Ala Thr His Glu Gly Asp Leu Tyr Gin Tyr Ala Gly lie Glu 
115 120 125 

Thr Leu Thr Asn Pro Leu Pro Pro lie Val Ala Val Asn Thr Thr Ala 
130 135 140 

Gly Thr Ala Ser Glu Val Thr Arg His Cys Val Leu Thr Asn Thr Glu 
145 150 155 160 

Thr Lys Val Lys Phe Val lie Val Ser Trp Arg Lys Leu Pro Ser Val 
165 170 175 

Ser lie Asn Asp Pro Leu Leu Met lie Gly Lys Pro Ala Ala Leu Thr 
180 185 190 

Ala Ala Thr Gly Met Asp Ala Leu Thr His Ala Val Glu Ala Tyr lie 
195 200 - ' 205 

Ser Lys Asp Ala Asn Pro Val Thr Asp Ala Ala Ala Met Gin Ala lie 
210 215 220 

Arg Leu lie Ala Arg Asn Leu Arg Gin Ala Val Ala Leu Gly Ser Asn 
225 230 235 240 

Leu Gin Ala Arg Glu Asn Met Ala Tyr Ala Ser Leu Leu Ala Gly Met 
245 250 255 

Ala Phe Asn Asn Ala Asn Leu Gly Tyr Val His Ala Met Ala His Gin 
260 265 270 



Leu Gly Gly Leu Tyr Asp Met Pro His Gly Val Ala Asn Ala Val Leu 
275 280 285 
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Leu Pro His Val Ala Arg Tyr Asn Leu lie Ala Asn Pro Glu Lys Phe 
290 295 300 

Ala Asp lie Ala Glu Leu Met Gly Glu Asn lie Thr Gly Leu Ser Thr 
305 310 315 320 

Leu Asp Ala Ala Glu Lys Ala lie Ala Ala He Thr Arg Leu Ser Met 
325 330 335 

Asp He Gly He Pro Gin His Leu Arg Asp Leu Gly Val Lys Glu Ala 
340 345 350 

Asp Phe Pro Tyr Met Ala Glu Met Ala Leu Lys Asp Gly Asn Ala Phe 
355 360 365 

Ser Asn Pro Arg Lys Gly Asn Glu Gin Glu He Ala Ala He Phe Arg 
370 " ' 375 380 

Gin Ala Phe 
385 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
G C GAATT CAT GAGCTATCGT ATGTTTG 27 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
GCGAATTCAG AATGCCTGGC GGAAAATC 2 8 

(2) INFORMATION FOR SEQ ID NO: 40: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
{ D) TOPOLOGY : linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(xi) 



SEQUENCE DESCRIPTION: 



SEQ ID NO:40: 



GGGAATT CAT GAGCGAGAAA ACCATGCG 



28 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
GCGAATTCTT AGCTTCCTTT ACGCAGC 27 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GCGAATTCAT GCAACAGACA ACCCAAATTC 30 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 
GCGAATTCAC TCCCTTACTA AGTCG 25 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
GGGAATT CAT GAAAAGATCA AAACGATTTG 30 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GCGAATTCTT ATTCAATGGT GTCGGGCTG 29 



(2) INFORMATION FOR SEQ ID NO: 46 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 34 base pairs - 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
TTGATAATAT AACCATGGCT GCTGCTGCTG ATAG 34 
(2) INFORMATION FOR SEQ ID NO: 47 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 39 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 

GT AT GAT AT G TTATCTTGGA TCCAATAAAT CTAATCTTC 3 9 

(2) INFORMATION FOR SEQ ID NO : 4 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
(-B-)- TYPE: nucleic acid 
(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4.8: 
CAT GACTAGT AAGGAGGACA ATTC 24 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 
CAT GGAATT G TCCTCCTTAC TAGT 24 



